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THE EFFECT OF CHANGES IN QUALITY 
OF ILLUMINATION UPON VISUAL 
PERCEPTION’ 


S. W. FERNBERGER, M. S. VITELES, and W. R. CARLSON 


University of Pennsylvania 


THE PROBLEM 


: | "HE experiment described in this report grows out of a 


problem uncovered by one of the authors (Viteles) in the 

course of an industrial investigation. It represents an 
attempt to apply psychophysical measurement techniques in the 
analysis of visual perception with the view of applying the findings 
to promote the production and adjustment of workers in a specific 
occupation. 

The occupation involved is that of specking, found in a textile 
plant engaged in weaving white flannel cloth. After the cloth 
has been woven, scoured, and bleached, it is necessary to inspect 
and to remove from the flannel all dark hairs and foreign material 
that may have been woven into the cloth. Although wool is 
sorted before it is spun and woven, the sorting process does not 
succeed in removing ail dark hairs and all foreign material. It 
is the failure of sorters to remove these impurities that creates a 
need for the specking operation in the manufacture of better grade 
white flannels. 

In the plant under consideration it was found that the cloth was 
specked under a variety of lights. Over a number of the specking 
tables were hung lighting-fixtures equipped with standard Mazda 
lamps, which give a white light with a distinctly yellow tinge. 
Other specking tables were equipped with the so-called (blue) 
daylight lamp, technically known as C2 Mazda lamp. The choice 
of lamps was largely left to the caprice of the worker or of the 
foreman. Most workers were using the C2 Mazda lamp (blue 





*This experiment was made possible through the co-operation of the Faulkner 
& Colony Mfg. Co., Keene, New Hampshire. 
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daylight). The general opinion of workers and supervisors was 
that with this type of lamp it was easier to see impurities, par- 
ticularly fine black hairs and that there was less eye-strain than 
with the standard Mazda lamp. At the same time some of the 
workers continued to use standard lamps because of their pref- 
erence for the latter type of light. 

No figures were available to show the influence of each type 
of illumination from the viewpoint of production. An attempt is 
being made under ordinary working conditions to gather data 
useful for this purpose. At the same time it seemed desirable 
and of interest to investigate the relative suitability of the two 
types of lights under controlled laboratory conditions. It was for 
these reasons that the present experiment was undertaken. At 
the same time, the fact that there was available a mercury vapor 
lamp of the type used in certain textile mills made it possible 
to include this with the other types of light that were being 
investigated in the laboratory. 


APPARATUS, PROCEDURE AND SUBJECTS 


The procedure employed in the experiment was as follows: 

Pieces of bleached white flannel taken from the same cut of 
cloth were pasted on white cardboard. In the center of each piece 
of flannel was inserted a 4 inch length of gray thread Clark’s 
Mercerized Sheer Fabric, Color 25). Seven stimulus cards were 
employed. On the first a single strand of thread was sewed; on 
the second, a double strand; on the third, a triple strand, etc., 
giving a series of 1 to 7 strands of thread inclusive. In addition 
2 pieces of flannel free from gray thread were employed as control 
stimuli. Throughout the experiment particular care was taken 
to keep the white flannel free from smudge so as to avoid variations 
in the brightness of the white flannel background. 

Cards were inserted in a modified Whipple tachistoscope which 
allowed approximately a 3 inch square of white flannel to be 
exposed. Subjects were seated before the tachistoscope and given 
the following instructions: “You will be shown a number of white 


*By M. S. Viteles, who acts as Consulting Psychologist for the Faulkner & 
Colony Mfg. Co. 





wens ef AL 





tte th cps be 


sae 


o- 


itt 

















QUALITY OF ILLUMINATION UPON VISUAL PERCEPTION 613 





cards. On certain of these there is a short gray line in the middle 
of the card. You will report immediately after each exposure 
whether or not there was a line on the card in terms of the three 
following categories: “YES—NO—DON’T KNOW.” Before 
exposing the stimulus card the experimenter gave a “Ready”— 
“Now” signal, the former approximately 2 seconds and the latter 
approximately 1 second prior to the exposure of the card. 

Each subject sat at a constant distance from the tachistoscope, 
the stool being placed 5 feet from the exposure surface. The ex- 
periment was performed in a dark room. No series was started 
until the subject had been in the room for a period sufficiently 
long to produce partial dark adaptation. Lights were not switched 
on during the rest periods. Screens were set up to keep light 
reflected from the rear of the tachistoscope out of the subject's 
line of vision so that throughout the experiment no subject was 
disturbed or distracted by such reflected light. 

The subjects of the experiment were 6 students, members of 
the class in industrial psychology given by Dr. Viteles. Two 
were graduate students (H and W) and the remaining 4 (Z, T, 
W2 and B) were advanced undergraduates. None of the subjects 
was familiar with tachistoscopic procedure or with the purpose 
of the experiment. Before the actual experiment was given two 
or more practice sessions were given to each subject to familiarize 
him with the use of the tachistoscope and to determine the appro- 
priate exposure time. Although these times varied considerably at 
first, it was found as a result of this preliminary work that the 
same exposure time was suitable for all subjects and a constant 
exposure time of 50 sigma was employed throughout the experi- 
ment. 

The chief object of the experiment was to determine the effect 
of differences in quality of lighting upon the discrimination 
threshold. In order to limit the investigation to an examination 
of the effect of quality of lighting alone, it was necessary to control 
carefully (1) the amount and diffusion of illumination on the 
surface of the exposed stimulus and (2) the voltage of the source 
of light, inasmuch as variations in voltage affect both the quantity 
of illumination (as expressed in foot-candles) and the quality 
(as expressed in terms of the color of reflected light). Such 
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controls were maintained throughout the experiment by setting 
into the lamp circuit a resistance unit for maintaining voltage, 
and by making periodic measurements of the illumination of 
the exposed surface by means of a foot-candle meter. The source 
of illumination was so arranged that there was practically no 
variation in the amount of illumination over the exposed surface. 
Careful measurements showed the variability in diffusion of 
illumination to be less than 5 per cent.® 

Four conditions of lighting were employed in the experiment. 
These were as follows: 

Condition A. 18 foot-candles of illumination from standard 
Mazda lamps. 


Condition B. 18 foot-candles of illumination from C2 Daylight 
Mazda lamps. 

Condition C. 18 foot-candles of illumination from a Mer- 
cury lamp. 

Condition D. 8 foot-candles of illumination from C2 Day): 
Mazda lamps. 


The object of including Condition D was to determine roughly 
the effect of decrease in the amount of illumination upon the 
discrimination threshold. 

Each subject had 2 sittings on each of the 4 conditions of 
illumination. Each sitting involved 252 judgments, i.e., 28 judg- 
ments on each of the 9 comparison stimuli. The trials for each 
subject were arranged in a pre-determined order so as to rule 
out the influence of training in comparing the results obtained 
under the 4 conditions of illumination. 


DISCUSSION OF RESULTS 


The results of all 6 subjects fell into curves which sufficiently 
approximated those of the psychometric functions to justify the 
application of Urban’s method of calculation for the method of 
constant stimuli. In the case of observers W2 and T, the calculations 


*The authors acknowledge the helpfulness of A. R. Brainerd, Lighting Engineer, 
Philadelphia Electric Company, in furnishing equipment for making such measure- 
ments and for otherwise co-operating in this experiment. 
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of limens is in the nature of an extrapolation rather than the 
calculation of a median somewhere near the center of the curve. 
(Table 1) 

Although in most cases there are no pronounced variations be- 
tween the limens for single observers there are certain consistent 
tendencies from one observer to another which will bear brief 
discussion. These are as follows: 

(1) In general, the limens for the mercury lamp are larger 
than those obtained with other conditions of illumination. It 
seems possible to conclude that this is the least efficient type of 
idumination under the conditions of this experiment. 

(2) There is an equally strong tendency for 8 foot-candles of 
illumination from C2 Mazda lamps (blue daylight) to give the 
smallest limens. This type of illumination can therefore be de- 
scribed as the most efficient under the conditions of the experiment. 

(3) Eighteen foot-candles of illumination from both standard 
Mazda lamps and C2 Mazda lamps give results of intermediate 
magnitude for all observers. Whatever difference exists between 
these two types of lamp seems to favor the (blue) daylight illumina- 
tion supplied by the C2 Mazda lamp. 

The finding that the use of the C2 Mazda lamp of low intensity 
leads to better discrimination than any other type of illumination 
might suggest the desirability of using blue daylight lamps of 
low intensity in order to promote a better quality of specking in 











TABLE I 
Thresholds 
Subjects 

Size of Lamp Foot-Candles H W Cc F W2 i 
Standard Mazda 18 anf ant 247 221 116 656 
Standard Mazda 18 0.97 2.82 166 1.77 1.02. 0.71 
Mercury 18 2.63 2.22 3.80 2.95 1.08 0.93 
Mercury 18 154 2.68 2.94 1.82 1.21 0.81 
C 2 Daylight Mazda 18 2.10 3.08 2.03 2.39 0.98 0.60 
C 2 Daylight Mazda 18 S00. aot Son. ae. Gae G7 
C 2 Daylight Mazda 8 149 2.14 3.77 0.96 0.81 0.47 


C 2 Daylight Mazda 8 141 1.63 1.00 1.45 0.01 0.44 
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the industrial situation. Although the results may support this 
conclusion with respect to the quality of light, there is reason to 
suspect the validity of these findings with respect to the quantity 
or intensity of illumination. The findings with respect to intensity, 
it is important to note, are in marked conflict with those of 
experiments by Hess and Harrison,* Luckiesh and Moss,® White, 
Britten, Ives, Thomson,® Goldstern and Putnoky,’ Davies, Weston 
and Taylor,* and others who have found a direct relationship 
between intensity of illumination and efficiency of work as meas- 
ured by quantity and quality of output. 

This discrepancy in findings may be explained by the specific 
conditions of the present experiment as contrasted with those of 
other investigators. The latter were performed in the industrial 
plant under actual working conditions. They involved continuing 
performance over a period of time under conditions of general 
artificial illumination. The fact that the present experiment in- 
volved work of short duration in a dark room probably explains 
the discrepancy in results and also points to the inapplicability of 
the findings of this laboratory investigation, with respect to in- 
tensity of illumination at least, to the industrial situation. 

In a general way, it appears that two factors of psychological 
importance play a predominant réle in producing the results on 
intensity growing out of this experiment. The first is the generally 
observed tendency for an increase in attention where the intensity 
of stimuli is decreased. Although such increased attention can 


*D. P. Hess and W. Harrison, “Relation of Illumination to Production,” Trans. 
I. E. S., 18 (1923), pages 787 ff. 

5M. Luckiesh and F. K. Moss, Seeing, Baltimore, (1932), p. 6. 

*L. R. White, R. H. Britten, J. E. Ives and L. R. Thompson, “Studies in 
Illumination, IJ. Relationship of Illumination to Ocular Efficiency and Ocular 
Fatigue among the Letter Separators in the Chicago Post-Office.” U. S. Pub. 
Health Serv., Pub. Health Bull. 181, 1929, p. 58. 

™N. Goldstern and E. Putnoky, “Arbeitstechnische Untersuchungen iiber die 
Beleuchtung von Webstuhlen,” Ind. Psychot., 7 (1930), pp. 321-28: “Beleuchtung 
und Leistung am Webstuhl,” Ind. Psychot., 7 (1930), pp. 353-73. 

®H. Davies, “Lighting in the Factory,” J. Nat. Inst. Ind. Psych., 3 (1927), pp. 
377-85: H.C. Weston and A. K. Taylor, “The Relation between Illumination 
and Efficiency in Fine Work,” Ind. Fat. Res. Bd. and Illum. Com. Joint Rep., 
London, 1926, p. 11. 
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be sustained over a short period without undue fatigue and eye- 
strain (when the stimulus is visual in character), it is highly 
questionable whether it can be equally well sustained over a long 
period without ultimately producing both decreased efficiency and 
serious effects in the way of unnecessary fatigue. 

The second factor of possible importance is the tendency toward 
increased visual sensitivity for short-length waves in twilight or 
with dark adapted vision—the so-called Purkinje phenomenon. 
Experiments by Kravkov,® Bader and Jaensch,’® Sloan,‘? and 
others favor the conclusion that the discriminatory power of twi- 
light vision increases with the dark adaptation of the eye. It is 
possible that this actually explains the lowered limens in illumina- 
tion with the C2 Mazda lamp, which tends to be saturated with 
blue. This is possibly combined with the attention factor in im- 
proving discrimination with lower intensity of illumination by 
the C2 Mazda lamp. However, in the light of this situation, the 
tentative conclusion cited above that the C2 Mazda lamp is prefer- 
able under actual industrial working conditions, where the factor 
of dark adaptation is absent, may itself be questionable. 


®°S. W. Kravkov, “The Discriminatory Power of the Periphery of the Retina in 
Twilight Vision,” Graefes Arch. F. Ophth., 127 (1931), pp. 86-99. 

2°F, Bader and E. Jaensch, “On the Stratified Structure and Evolution of the 
Psychophysical Organization. III The Two Roots of the Purkinje Phenomenon and 
Their Intimate Relation,” Zsch. F. Psychot., 115 (1930), pp. 117-45. 

™L. L. Sloan, “Effect of Intensity of Light, State of Adaptation of the Eye, 
and Size of Photometric Field on the Visibility Curve: A Study of the Purkinje 
Phenomenon,” Psych. Monog., 38 (1928), p. 87. 





THE EIGHTH INTERNATIONAL 
PSYCHOTECHNIC CONGRESS, PRAGUE, 
SEPTEMBER 11-15, 1934 


“Tie Congress was really to have been held a year ago 


in Vienna. The German colleagues found the regula- 
tions for leaving their country so unfavorable that the 
place of meeting was changed. The hospitable Czechs offered 
their beautiful capital city. Only six of the German psychotech- 
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nologists came, and only one university professor contributed 
something, occasioning vigorous comment by so doing. Some 
French scholars were refused transit over Germany. 

The President of the Republic, Masaryk, took over the pro- 
tectorate of the Congress. At the opening meeting, the Ministers 
of public instruction, of public works, and the representative of 
Dr. Benes participated. The National Assembly was represented 
by the president of the Senate. From the reports of Minister 
Krcmar and Professor Seracky, president of the Congress, it was 
revealed that the idea of psychotechnic examinations has taken 
deep root in Czechoslovakia. They have been established as 
obligatory for all students in the first semester of Liberal Arts 
Colleges. Preliminary trials have been made to extend them next 
to the graduates of the secondary and lower secondary schools. 

The general secretary of the International Psychotechnic Con- 
gress, reporting on significant events in the Congress forthe last 
three years, presented the resignations of two members of the 
directors’ committee, Professor William Stern and Dr. O. Lip- 
mann (deceased), who lost their teaching positions in Germany 
on account of their race. The resignations were not accepted. 
Only scientific or scholarly merits determine membership. 
Professor Stern is now considered as a representative of world 
science. 

Professor Lahy (Paris) spoke of the application of psychology 
in business and industry. He briefly sketched the history of 
psychotechnic, its tendencies, methods and results, emphasizing 
the autonomy of psychotechnic as a science; and yet the necessity 
of a close alliance with other sciences such as medicine. 

Franziska Baumgarten (Bern) in discussing the Ideology of 
Psychotechnic explained the rapid development of psychotechnic 
as due to its practical results and also to the purpose it serves, 
“to put the right man in the right place.” One of the con- 
sequences of this is that it strives to judge the applicant only on 
the grounds of his capacity for work; to see him in the light of 
his qualifications and propensities—without his social and do- 
mestic constraints. Just as the preservation of life and health is 
the supreme law of medicine, which unites the physicians of all 

(Continued on page 656) 





RELIABILITY AND HAL O EFFECT OF 
HIGH SCHOOL AND COLLEGE 
STUDENTS’ JUDGMENTS OF THEIR 
TEACHERS 


H. H. REMMERS 
Purdue University 


I. THE PROBLEM 


/ | ‘HE present study was undertaken to obtain data bearing 


on the following questions. 

(1) How reliable are the judgments of high school 
pupils when they judge classroom traits of their teachers? Re- 
liability is defined in this study to mean the amount of agreement 
or correlation between the judgments of randomly selected pairs 
of students in their judgments upon the same trait for the same 
teacher. Such a correlation will obviously yield a measure of 
the reliability of an individual student’s judgment of a single 
trait. 

(2) To what extent is “halo effect” present in the judgments 
of high school pupils? By “halo effect” is meant an emotional 
or affective constant or attitude which causes a judge to rate a 
given individual in terms of his like or dislike for the given 
person. If the individual is liked, the halo effect would appear 
in relatively high ratings for all desirable traits; if disliked, rela- 
tively low ratings for all such traits; and, as a consequence, high 
intercorrelations of different traits. 

(3) To what extent is “halo effect” present in the judgments 
of college students? 


I]. REVIEW OF PREVIOUS STUDIES 


In a previous study’ there was reviewed briefly the previous 


*Remmers, H. H., “The Equivalence of Judgments to Test Items in the Sense 
of the Spearman-Brown Formula,” Journal of Educational Psychology, 1931, 
XXII, 1, pp. 66-71. 
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literature on the problem of the applicability of the Spearman- 
Brown prophecy formula to human judgments. The findings in 
general may be summarized by a quotation in the introduction of 
that study: “‘the Spearman-Brown prediction formula shows it 
to give meaningful prediction on such materials as mental test 
items, spelling words, [judgments of] lifted weights, true-false 
items in language, and component units of rating scales.’” 

The Spearman-Brown prophecy formula perhaps requires brief 
explanation. It was originally developed as a statement of the re- 
lationship of the reliability of a given length of a test to the in- 
crease in reliability with an increase in length when certain 


conditions of similarity of test material hold.? 
In Kelley’s notation, the formula is 


"af, AS ™ Taz 


which gives the “correlation between the average score on a forms of a test and 
a other similar forms.” 


In the study previously mentioned, the hypothesis was tested 
that judgments of college students were analogous to test items 
in the sense of the Spearman-Brown formula. This hypothesis 
was found to be correct. 

The experimental data in the above-mentioned study further 
corroborated these findings for the specific situation in which 
judgments of college students concerning the teaching traits of 
their instructors were obtained when measured on the Purdue 
Rating Scale for Instructors. The observed reliabilities of students’ 
judgments corresponded within the limits of allowable sampling 
error with values of r predicted from the formula. It was further 
concluded that 

“The three traits sampled in this investigation vary signifi- 
cantly in reliability. Stimulation of Intellectual Curiosity, for 
example, means more different things to students than does the 
trait Presentation of Subject-Matter. 

“In general, ratings by from ten to twenty students on a single 
trait for instructors differing sensibly in the amount of the trait 


*Kelley, T. L., Statistical Method, Macmillan Co., 1923, pp. 205-208. 
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possessed yield reliabilities which compare rather favorably with 
the reliabilities reported for standardized mental and educational 
tests. 

“It is probable that in the majority of situations in which sub- 
jective judgments are used—personnel ratings, stock judging, 
debate judging, beauty contests, jury verdicts, political polls, 
etc. . . . the Spearman-Brown prophecy formula indicates the 
number of judgments required for a given reliability, a 

As has already been pointed out, the judges in the previous 
study were college students, and the study concerned itself with 
the question of reliability® of the judgments of traits as defined 
on the Purdue Rating Scale for Instructors,* a copy of which 
follows. 

In the study under review as in the one which is the subject 
of this paper, only the three most important of the ten traits were 
investigated—most important, that is as determined by student 
judgments in another study.® In this latter study each of approxi- 
mately 100 students arranged the ten traits in their order of im- 
portance as he judged importance of the traits in a teacher. Chance 
halves of these rankings yielded substantially perfect agreement 
of the group in giving first place to Trait 5, Presentation of Subject 


*The problem of validity of the judgments is hardly pertinent. While re- 
liability may be defined as the accuracy with which a measuring instrument 
measures whatever it does measure, validity is defined as the extent to which 
the instrument measures what it purports to measure. Since it is student judgments 
that constitute the criterion, reliability and validity are in this case synonymous. 
To quote T. L. Kelley: “If competent judges appraise Individual A as being as 
much better than Individual B as Individual B is better than Individual C, then 
it is so, as there is no higher authority to appeal to.” The Influence of Nurture 
upon Individual Difference, Macmillan Co., 1926, p. 9. 


*See Remmers, H. H., “The College Professor as the Student Sees Him,” 
Bulletin, Purdue University, Studies in Higher Education XI, March, 1929, for a 
report of an extensive rating project by means of the Scale. This report also 
summarizes most of the research done on the Scale and published in various 
journals. In the present paper only some of the more pertinent researches will 
be mentioned in order to conserve space. 


®Stalnaker, J. M., and Remmers, H. H., “Can Students Discriminate Traits 
Associated with Success in Teaching?” Journal of Applied Psychology, Dec., 1928, 
pp. 602-608. 
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THE PURDUE RATING SCALE FOR INSTRUCTORS 


Note to Instructors: Seni Sore contains 9 nay ontiem 90 genie SO Renin Gn. tee 
tions be given to the students. The rating scale should be passed out without beginning of the period, 


Note to Students: Following is a list of qualities that, taken together, tend to make any instructor the sort of 
instructor that he is. Of course, no one is ideal in all of these qualities, but some approach this ideal to a much 
greater extent than do others. In order to obtain information which may lead to the improvement of instruction, you 
are asked to rate your instructor on the indicated qualities by making a check (V) on the line at the point which 
most nearly describes him with refi to the quality you are considering. For example, under Interest in Subject 
if you think- your i is not as husiastic about his subject as he should be, but is usually more than mildly 
interested place the check on the scale thus: 


PRUTUTYOTONTNTVET ETT TOYTTeN TO TETT.” COTTE TETTYOTETTTOTE TTT TT TTPO TYE TTT NTT TTT TTTTTTY | 


Always appears full of his subject. Seems mildly interested. Subject seems irksome to him. 











This rating is to be entirely impersonal. Do not sign your name or make any other mark on the paper which 
could serve to identify the rater. 


Be sure to put your check on the line where you think it should be to express your jud of the instruct 





laterest in Subject 


Always appears full of his subject. Geems mildly interested. Subject seems irksome to him. 


Sympathetic Attitude toward Students 


Always courteous and considerate. Tries to be considerate but finds it Entirely a a and incon- 
difficult at times. rate. 


Fairness in Grading 
Absolutely fair and impartial to all. Shows occasional favoritism. Constantly shows partiality. 


Liberal and Progressive Attitude 
Welcomes differences in viewpoint. Biased on some things but usually Entirely intolerant, allows no con- 
tolerant. adictio: 
Presentation of Subject Matter 
Clear, definite and forceful. Sometimes mechanical and mono- Indefinite, involved, and monoton- 
tonous. ous. 
Sense of Proportion and Humor 


Always keeps proper balance; not Fairly well balanced. 
over-critical or over-sensitive. 
Self-reliance and Confidence 
' 


Always sure of himself; meets dif- Fairly self-confid 
ficulties with poise. Gasenterted. 





Personal Peculiarities 





Wholly free from annoying man- Moderately free from objectionable C 
nerisms. peculiarities. 


Personal Appearance 


Always well groomed; clothes neat Usually somewhat untidy; gives Slovenly; clothes untidy and ill- 
and clean. little jon to é kept. 





Stimulating Intellectual Curiosity 


Inspires students dent 0 lly inspiring; creates mild Destroys interest in subject; makes 
effort; creates _ or 4-- > interest. work repulsive. 





Underline the phrase which best places the instructor as compared with other instructors: In my judgment this 
instcuctor is in 
(1) the highest fifth (3) the middle fifth 
(2) next to the highest fifth (4) next to the lowest fifth 
(5) the lowest fifth 


Ripe Wil apr yes aie” 2, 
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Matter; second place to Trait 10, Stimulating Intellectual Curios- 
ity; and third place to Trait 1, Interest in Subject. 


III]. EXPERIMENTAL PROCEDURE 


The average reliability of high school pupils’ judgments was 
found by a sampling procedure from the rating sheets of 57 stu- 
dent teachers at the end of their practice teaching period of ap- 
proximately eleven weeks (54 periods) in the local high schools.® 
From each of the pile of sheets for each of the 57 student teachers 
two judges’ papers were drawn at random, and the judgmental 
values on each of the three traits in question written down. This 
yielded 57 pairs of pupils’ judgments on each of the three traits. 
These values for each trait were then entered in a correlation 
table, or scattergram. The sheets were then reinserted and the 
piles shuffled for another drawing of two sheets. This process 
was repeated until twenty correlations for each trait had been 
obtained. 

In the measurement of halo effect the procedure was to select 
at random one judge’s ratings for each teacher and to obtain the 
intercorrelations of the three traits under investigation, the sam- 
plings being repeated until the required ten intercorrelations were 
obtained. 


IV. DATA ON RELIABILITY 


The average reliability of high school pupils’ judgments per 
single trait was found by the sampling procedure as described 
from the rating sheets of 57 student teachers. The twenty 
samplings of one pupil versus another pupil yielded the data 
presented in Table I where the r’s have been rearranged in de- 
scending order of magnitude. While a sampling of 20 7’s is 
rather meager, the labor involved is a rather effective deterrent 
from obtaining larger samplings. 


*I desire here to acknowledge the cordial co-operation of my colleague, Pro- 
fessor R. R. Ryder, who is in charge of the practice teaching work, in making 
the rating sheets available for my use after he had obtained the ratings for this 
purpose in the training of the practice teachers. 
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TABLE I 


Reliabilities of High School Pupils’ Judgment for Trait 5, Presentation 
of Subject Matter 
Number of teachers—=57 








r r r r 

50 Re 28 15 
44 30 24 13 
41 29 19 ll 
41 Ps | 16 ll 
34 29 15 ll 





Average r = .261+.018 


Since it was shown in the paper mentioned at the beginning of 
this study * that the Spearman-Brown formula validly applies to 
the prediction of the increase in reliability for a single trait to 
be expected with an increase in the number of judges, it follows 
from the average reliability determined in Table I that, to obtain 
reliabilities of .90 and .95, the average of the ratings of approxi- 
mately 25 and 54 pupils respectively would be required. These 
reliabilities compare favorably with those of the better standard- 
ized mental and educational tests now available, and there will 
be very few if any high school teachers who have less than 25 
pupils in their classes. It follows, therefore, that highly reliable 
ratings of high school teachers for the trait, Presentation of Sub- 
ject Matter, can be obtained if the sums or averages of 25 to 60 
pupils’ ratings be obtained. 

The twenty samplings of r for Trait 10, Stimulating Intellectual 
Curiosity, rearranged in descending order, are given in Table II. 


TABLE II 


Reliabilities of High School Pupils’ Judgment for Trait 10, 
Stimulating Intellectual Curiosity 
Number of teachers = 57 








r r r r 

466 244 131 015 
342 .232 129 —.003 
331 229 123 —.020 
273 222 -108 —.050 
261 151 040 —.094 





Average r = .156+.021 


™Remmers, Op. cit. 
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The probable error of the average requires perhaps a brief ex- 
planation and comment. It is calculated from the most probable 
value (average) by formula. It is interesting to note that the 
calculation from the actual distribution agrees surprisingly well 
with that from the formula. Since there are only 20 r’s, the appli- 
cation of sampling theory becomes somewhat dubious because 
of the small sample. While these probable errors are given for 
what they are worth, it is to be borne in mind that they are prob- 
ably larger than the values obtained by the formula for the prob- 
able error of the average. 

The reliability of the average pupils’ judgment on Trait 10 is 
.156—rather appreciably lower than on Trait 5, Presentation of 
Subject Matter, where the average r was .261. Viewing this com- 
parison as a quantitative evaluation of the functional use of lan- 
guage, it may be said that pupils are more nearly agreed on the 
relative presence or absence of ability to present subject matter 
acceptably than they are as to the presence or absence of the 
ability of teachers to stimulate intellectual curiosity of pupils. 

To obtain reliabilities of .90 and .95 for this trait with high 
school pupils rating practice teachers requires 49 and 103 pupil 
judgments respectively. 

The results for Trait 1, Interest in Subject Matter, are given 
in Table III. 

TABLE III 


Reliabilities of High School Pupils’ Judgment for Trait 1, 


Interest in Subject 


Number of teachers — 57 








r r r r 
349 254 222 126 
347 244 184 122 
345 243 174 078 
340 223 153 069 
279 223 143 020 





Average r = .207+.014 


The average r determined for this trait, .207, indicates that to 
obtain reliabilities of .90 and .95, the numbers of pupils required 
are 34 and 73 respectively. 
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It is apparent from the data presented that reliable judgments 
for single traits may be obtained from high school pupils concern- 
ing practice teachers at the end of from ten to twelve weeks of 
acquaintance in the classroom situation. 

It is of interest at this point to compare the reliabilities obtained 
for the three traits under investigation. Table IV summarizes the 
pertinent data. 


TABLE IV 


A Comparison of Reliability of Judgment for the Average High 
School Pupil and College Student 











High School College* Diff. Chances in 
Trait upils Students Difference “Pp. E. 100 of a 
No. Teachers No. Teachers " "a “True” Diff. 
57 37 
1. Interest in 
Subject .207+.014 .290+.102 .083+.040 2.08 92 
5. Presentation of 
Subject Matter .2612+.018  .429%.024 .168+.030 5.60 100 
10. Stimulating 
Intellectual 
Curiosity 156.021 .354+.038 .198+.043 4.61 100 





* Values obtained from Remmers, H. H. “The Equivalence of Judgments to Test 
Items in the Sense of the Spearman-Brown Formula,” op. cit. 


The reliabilities for the typical high school pupil judging stu- 
dent teachers at the end of approximately twelve weeks are con- 
sistently lower than those for the typical college student judging 
college teachers after an indeterminate time of acquaintance, but 
usually longer than twelve weeks. The probabilities are, of course, 
that with increased samplings these differences would remain of 
the same order of magnitude and would be in the same direction. 
These data also raise the interesting problem of optimum time 
of acquaintance for maximum reliability and minimum halo 
effect. 


V. HALO EFFECT 


The question of the halo effect may next be considered. To 
the extent that the intercorrelations of the three traits under in- 
vestigation are less than 1.00 when corrected for attenuation, each 








’ 
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trait is a psychological function independent of the other two 
traits. For these intercorrelations samplings of ten correlations of 
each of the three traits with the other two were obtained as previ- 
ously described. The results are given in Table V.§ The samplings 
are again in descending order of magnitude of r, so as to make 
more apparent the range. 


TABLE V 


Intercorrelations of High School Pupils’ Judgments for the Traits 
Indicated 








Samplings Trait 1 ws. Trait 5 Trait 1 vs. Trait 10 Trait 5 vs. Trait 10 








1 31.08 18.08 40.07 
2 252.08 13.08 25.08 
3 21.08 12.08 17.08 
4 11.08 10.08 16.08 
5 .07+.08 06.08 08.08 
6 06.08 03.08 .07+.08 
7 —.02+.08 01.08 03.08 
8 —.03+.08 —.12+.08 .01+.08 
9 ~— 5757 —.13+.08 .00+.08 
10 —.64+.05 —.14+.08 —.01+.08 
Average -— 095 .024 116 
No. of teachers 
judged 64 64 64 





The fact that the average of correlations of Traits 1 and 5 turns 
out to be slightly negative is the result of sampling fluctuations. 
In the light of other data it will be shown that the true value is 
most probably positive and of the order of magnitude of less than 
0.10. These other data are five samplings of intercorrelations of 
the summation of five randomly selected pupils against five other 
pupils for Trait 1 versus Trait 5 and Trait 1 versus Trait 10. 
These values will, of course, be more stable and should increase 
over the correlation for the one pupil’s judgment against another 
pupil’s judgment in accordance with the Spearman-Brown law. 
The results as shown in Table VI bear out this general hypothesis. 


*For the calculation of these correlations I am indebted to Miss Mary Trueblood. 
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TABLE VI 


Intercorrelations of the Summation of Five High School Pupils’ 
Judgments for the Traits Indicated 














Sampling N Trait 1 vs. Trait 10] N Trait 1 vs. Trait 5 
1 57 655.05 57 580+.06 
2 39 4322.07 57 317.08 
3 57 3702.07 39 258.08 
4 20 038.08 39 243.08 
5 39 .004+.08 20 .170+.08 
Average 299 313 








If from the average r in this table we calculate the intercorre- 
lation for one student (n = 4 in the Spearman-Brown formula), 
a value of approximately .08 is obtained, which corresponds fairly 
closely to the median value of the r’s for Traits 1 and 5. Taking 
this value as an approximation of the correlation between the 
judgments of two randomly selected pupils, we are in a position 
to calculate the “true” value, i.e., the value corrected for attenua- 
tion, by obtaining the ratio of the correlation of the two traits 
to the geometric mean of their reliabilities. This yields a “true” 
correlation of .34. This clearly indicates that these two traits as 
psychological functions of the pupil-teacher relationship have a 
considerable degree of independence of each other.?° In other 
words, a typical class of high school pupils agree rather well as 
to what they mean by the two traits in question, and are able to 
discriminate these two traits as defined by the Scale. It follows 
also as a corollary that in one sense different teachers possess dif- 
ferent amounts of these traits. , 

Treating the other two possible intercorrelations in a manner 
similar to that just described, we obtain a “true” correlation be- 


*Kelley, T. L., Statistical Method, Macmillan Co., New York, p. 204. 


Some correlation would, of course, be expected even if no halo effect were 
present, since the weight of all evidence favors the conclusion that desirable measur- 
able characteristics are positively correlated. The amount of true psychological 
concomitance in the occurrence of the traits is, however, indeterminate. From 
the standpoint of measurement theory it is, of course, only necessary to demonstrate 
the lack of correlation in order to validate the trait ratings. 
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tween Trait 1, Interest in Subject, and Trait 10, Stimulating 
Intellectual Curiosity, of .13—almost complete lack of dependence. 

For judgments of Presentation of Subject Matter as correlated 
with Stimulating Intellectual Curiosity the interdependence is 
much greater. Correction for attenuation yields a value of .58. 
Although this indicates considerable overlap of the two traits, 
nevertheless the lack of overlap is greater than the amount of 
overlap. All three traits may be said to constitute psychological 
functions having a gratifying degree of independence of each 
other. 

Results for college students as to halo effect can be briefly sum- 
marized. Table VII gives the pertinent data. It appears that, in 
general, there tends to be a greater halo effect in the judgments 
of college students for Traits 1 versus 5 and 1 versus 10 than in 
the judgments of high school pupils for the same traits. Traits 5 


TABLE VII 


Intercorrelations of Judgments on Traits 1, 5, and 10 
of College Students for 76 Instructors 


Number of Samplings of r = 20 








Traits Correlated Average r y Corrected for 





Attenuation Range of r’s 
1 vs. 5 .183 525 .023 to .363 
1 vs. 10 123 384 —.095 to .352 
5 vs. 10 .188 488 —.130 to .483 





and 10 are somewhat less closely correlated for college students— 
though not significantly so—than for high school students. The 
important fact again, however, is that of the relative independence 
of the three traits of each other. 

The conditions which might bring about the differences ob- 
served between high school pupils and college students cannot 
be deduced from the present data. Further research is required 
to isolate them. Relative “range of talent” in both judges and 
judged offers one plausible hypothesis. It may be that practice 
teachers are considerably less variable than are college teachers 
in the traits here in question. It is also possible that the range 
of judgmental ability is greater for college students than for high 
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school pupils. Again, a difference in the “acquaintance factor” 
may exist to account for the differences in halo effect. All of 
these possibilities are amenable to experimental investigation. 


VI. concLusIons 


Apart from the theoretical implications of the present study, 
its practical results are such as to warrant the following generali- 
zations. 

(1) Reliable judgments of classroom traits of instructors can 
be obtained from both high school pupils and college students. 

(2) The traits here investigated, the three most important of 
ten in the scale, namely, Interest in Subject, Presentation of Sub- 
ject Matter, and Stimulating Intellectual Curiosity, have very 
little psychological interdependence. 

(3) It is probable that high school pupils will invest the prac- 
tice teacher with less halo than college students will their in- 


structors. 
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TRAINING COLLEGE FRESHMEN 
TO READ 


HERBERT MOORE 
Mount Holyoke College 


HE percentage of college freshmen who are unable to read 

and comprehend college reading matter ranges from 8 to 

25 per cent, depending upon admission requirements and 

types of required reading. The group depends upon the standard 

of the college: those who rank in the 50th percentile in one college 

group may be in anywhere from the 20th to the 70th in another 

group. Hence standardized tests of reading of diagnostic signifi- 
cance in one group are of very little value in another. 

In Mount Holyoke we have given standardized reading tests to 
incoming students for the past four years and have in the case of 
each test found less than two per cent of the class who, on the 
basis of the test indices, should have remedial work. On the other 
hand a much larger percentage find difficulty in mastering the 
reading requirements of the freshman year. The causes of that 
difficulty are the usual ones reported in study problems; viz., slow 
reading rate, poor vocabulary, inability to grasp the central issues 
in paragraphs, and frequent failure to see the same meaning in 
differently phrased sentences. Detection of these students on the 
basis of raw scores from existing tests could not be made; and 
it was this need which led to the tentative adoption of a test which 
incorporated the difficulties and which we have found success- 
ful in selecting those who afterwards find college work severe. 

The test has seven parts, with one part—the vocabulary test of 
35 words—the criterion by which the difficulty, time limits, scores 
and values of the other parts were determined. The vocabulary 
test is composed of the most differentiating words found in the 
200-word vocabulary test used by the Department of English at 
Mount Holyoke. Those words defined correctly by from 56 to 70 
per cent of the freshman class were selected. The complete test 
gives a normal distribution and correlates +-.82 with the linguistic 
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section of the Scholastic Aptitude Test (1931). In the English 
Department test the alternatives are given in multiple-choice 
phrase form; in the Reading Test the alternatives are given in 
multiple-choice single-word form. Seventy words were selected 
and divided into two equally difficult groups. The correlation be- 
tween the two groups for 105 sophomores is +-.96. 

Other sections of the Reading Test include one on paragraph 
meanings (one prose and one poetry, each containing 25 lines 
about which five questions are asked); one on central meanings 
(four passages with multiple choice responses); one on proverbs 
(a test in which each of six proverbs is to be matched with one 
of similar meaning in a list of twelve); a symbols test (twelve 
figures the word equivalents of which are to be selected from 
a list of twenty-four); a paragraph heading test (five para- 
graphs, with multiple choice responses); and a rate of reading 
test (a 48-line passage in which there are six statements, true- 
false and multiple choice). 

The test was prepared in two equivalent forms. Between the 
two the correlation from 105 cases is +.92. Form A of the test 


GRAPH I 
Distribution of Results from Form A of the Mt. Holyoke Read- 
ing Test, (a) 225 cases 
(b) 382 cases. 
—(a) Mt. Holyoke 
---(b) Mt. Holyoke and Springfield 
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was given the entering class in 1933. The distribution of the 225 
cases is given in Graph I. On the same graph the distribution of 
scores of two freshmen classes in two eastern colleges is given— 
Mount Holyoke and Springfield. The joint median is 57, Mount 
Holyoke’s being 61 and Springfield’s 54. The Mount Holyoke 
results correlated +-.68 -+-.02 with the results from the linguistic 
section of the Scholastic Aptitude Test. 

Students who were in the lowest quartile of the Reading Test 
and also in the lowest local quartile of the linguistic section of 
the Scholastic Aptitude Test were invited to take a six-weeks 
training course in How to Study. During that period the class 
met once a week to discuss the usual subjects covered in a How- 
to-Study course. Each week exercises were given in finding 
central meanings, increasing reading rate, building up words, 
analyzing words, and comparing the meanings of similar proverbs. 
Each student was asked to select two books for special reading, 
one of which she read from for five minutes a day and one ten 
minutes a week. A graphic record of the number of lines read 
was kept by the student. Each freshman conferred with a senior 
for about forty minutes weekly, during which time the week’s 
assignments were discussed, current difficulties in other courses 
were cleared up, and a forty-hour-week schedule was drawn up 
and followed. 

GRAPH II 

Distribution of Results before and after the Training Period, 
(a) before (b) after. 

—(a) Form A, before training period 
---(b) Form B, after training period 


l 


(a) 


~~ 
o 
~ 
] 


: 
'‘ 
‘ 
' 
‘ 
-/ 
_ 


Sumber of Cases 





bee nnn nnesee 


pecs enn menos eeere 





Raw Score 
& gS g8 gs 38 








634 HERBERT MOORE 


At the end of the six weeks Form B of the Reading Test was 
given. The results of Form A and Form B are given in Graph II. 
The median from Form A is 46, from Form B for the same group, 
73. The per cent of improvement ranged from 4 to 150: of these 


5 made less than 25 per cent improvement, 


13 between 25 per cent and 49 per cent, 

10 between 50 per cent and 74 per cent, 

6 between 75 per cent and 99 per cent, 

5 between 100 per cent and 124 per cent, 
and 2 above 125 per cent. 


The differences in improvement made depended upon a number 
of factors. Of those making less than 50 per cent improvement, 
three attended two discussions and three conferences, two attended 
one discussion and one conference, two attended four discussions 
and no conference, and one attended one discussion and no con- 
ference. The gain made seemed to depend somewhat on the 
senior with whom the freshman worked. Thirteen seniors as- 
sisted. The average improvement made by the groups working 
with different seniors ranged from 29 to 107 per cent. The indi- 
vidual improvement made by the freshmen working with the 
three seniors whose groups made the greatest average gain was 
over 50 per cent in all cases but one, whereas the individual im- 
provement made by the freshmen working with the seniors whose 
groups made the least average gain was less than 50 per cent in 
all but two cases. 

Some of the obvious inferences from the experiment are: 

1. Lectures on How to Study are of very little value unless ac- 
companied by exercises on the specific points discussed. 

2. Conferences with seniors are of as much value as class dis- 
cussion. Over 70 per cent of the group reported the conferences 
as the most valuable part of the experiment. 

3. Vocabulary weakness is primarily due to unfamiliarity with 
word roots and affixes. Improvement can best be made by word 
analysis after the common prefixes and suffixes are known. 

4. The most common weakness is in the field of meanings and 
central issues in paragraphs. Freshmen’s capacity for losing sight 
of the core of paragraphs and becoming lost in incidental details 
seems to bespeak a weakness in courses in English grammar 
and Composition during pre-college days. 











THE AGREEMENT OF THREE OBSERVERS 
AFTER PRACTICE IN SIMULTANEOUS 
RECORDING OF BEHAVIOR 


WALTER C. RECKLESS 
Vanderbilt University 


MAPHEUS SMITH 


University of Kansas 


TEST team of three observers was organized to see what 
Aize its experience would throw on the problem of ob- 

serving behavior in social situations by the crudest and 
most unrefined methods. The scene of observation was the play- 
room of the receiving home of a child-placing agency. Children 
from four to seven years of age were allowed to romp in the play- 
room with a minimum of supervision. In fact, the supervisor 
merely interfered to prevent injuries, to settle disputes, or to hand 
out play materials. While the observations were going on, no 
effort was made by adults to impose organized play on the 
children. 

The observers recorded simultaneously the behavior of the same 
individual child for a period of five minutes (stop watch). They 
were instructed to write down (in stenographers’ notebooks) what 
the child did or as much of what he did as they could see and 
hear. No shorthand abbreviations were used. The observers were 
asked to make their notations concise, to record as many different 
acts the child manifested as possible, to use simple concrete lan- 
guage (i.e., untechnical), to read nothing into the behavior of 
the child, to make no inferences (i.e., to be as objective as pos- 
sible). 

The observers were not trained ahead of time. They were 
not told what to observe but were given a free hand. They were, 
however, cautioned against using subjective or interpretative ad- 
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jectives and adverbs.!. The test team did not know for what 
purpose its observations would be used. It had no set of behavior 
categories to follow or check off. Consequently, the recordings 
of the three members of the team represent not merely a crude 
form of observation but also a method with a minimum pre- 
determination of results. The purpose of this paper is to report 
the extent to which the three observers, using the method indi- 
cated, noted the same kinds of behavior.” 

The experience of the test team covered 114 simultaneously re- 
corded observations of five minutes’ duration each. The data 
for this article are taken from the last 28 five-minute recordings 
of the whole experience, on the assumption that they would repre- 
sent the team’s highest point of efficiency in longhand recording. 

These 28 five-minute written recordings, simultaneously made 
by three members of the test team, were analyzed by use of a list 
of 62 descriptive behavior categories. The unit for category 
classification was taken to be an act as defined by a verb such as: 
shoves Mary, throws ball, skips across room, looks at Miss S., goes 
to sand table, twists string, etc. For example, “looks at Miss S.” 
falls into behavior category “looks at adults”; “twists string” falls 
into category “manipulates objects.” 

The list of the 62 behavior categories used for purposes of 


7A count was made of the words (according to parts of speech) in the 43d, 44th, 
45th, and 46th consecutive simultaneous five-minute recording of each member 
of the test team. On an average for all three observers, about 35°% of the words 
consisted of nouns and pronouns; 33%, verbs; 21%, prepositions; 10%, adjectives 
and adverbs. 


?The observers were selected on two bases: first, that they were interested in 
making observations in the situation described; second, that they could write over 
150 words in five minutes. According to two preliminary writing tests, during 
which the three observers copied as much as they could in five-minute trials from 
unfamiliar passages of a book, the observers averaged well over 150 words copied 
in the five minutes. In the first trial they averaged 166.0 words per five minutes; 
in the second, 167.3. An analysis of 20 of the 28 recordings of the present study 
showed that the observers recorded, on an average, 52.98 acts per five minutes 
of observation and 2.7 words per act, or 143 words per five minutes. It is clear 
that the team in its average recording speed (143) was not far behind its average 
copying speed (167). Judged from copying and recording speed, observer I was 
the most efficient, observer II was next, while observer III was last. 
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classifying the various recorded acts is given in Table I. It should 
be noted that the categories were assembled some time (almost 
a year) after the team’s observations were made and were in no 
way used in the actual recordings.® 


TABLE I 


List of Behavior Categories Used in Classification and Tabulation of 
Written Recordings 


_ 


. Is Inactive—Stands, sits—does not move body or face. 
2. Moves mouth—Touches mouth with hand or object, opens or closes mouth, 
moves lips, licks lips, blows through mouth, etc. 


3. Moves Face—Contracts any facial muscles without pattern response by others. 

4. Moves Body—Moves or turns any part of head or body so long as there is no 
locomotion. 

5. Looks at self. 

6. Looks at Objects. 

7. Has Contacts with Objects—Touches, bumps into, leans against, etc. 

8. Manipulates—Handles objects in any way. 

9. Ignores Others—Does not change own activity when others act. 

10. Looks at Adults. © 


11. Looks at Children. 

12. Walks. 

13. Runs. 

14. Rides—Includes “Swings.” 

15. Shows other Locomotion—Changes position with regard to stationary objects. 

16. Shows Pattern Play with Objects—‘‘Games” that are defined in the group, such 
as “Drop the Handkerchief.” 

17. Shows Pattern Play without Objects-——‘‘Frog in the Middle,” “London Bridge,” 
etc. 

18. Sings—Pattern words or tune. 

19. Attacks Persons—Hits, pushes, jerks, etc. 

20. Has other Contacts with Persons. 

21. Approaches—Moves toward or speaks to without locomotion. 

22. Takes—Takes without asking for. 

23. Repulses—Maintains physical position when attacked. 

24. Disobeys—Behaves contrary to spoken command. 

25. Disagrees with—Disagrees in speech. 

26. Withdraws—When others act toward the child the child avoids them. 

27. Follows—Does what another has just done. 

28. Obeys—Behaves according to spoken command. 

29. Yields to Attack—Cowers, gives up what has, etc. 

30. Gives—Does something by which another benefits, such as “Pushes in swing,” 
“pushes in wagon,” etc. 

31. Makes negative body gestures—‘‘Shakes head ‘no,’” “Shrugs ‘no,’”’ etc. 

32. Makes positive body gestures—‘Shakes head ‘yes,’” etc. 


*The authors realize that a standardized set of behavior categories, if used in 
actual noting of behavior, would facilitate speed and accuracy, or if used for 
classification, would yield a more reliable frequency distribution. The list herein 
used has not been constructed or validated by judgment tests. 
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33. Makes indefinite body gestures—Can’t interpret the precise meaning athough 
they are responded to. 

34. Makes facial gestures—Winks, glances, etc., which communicate. 

35. Shows satisfaction—Smiles, laughs, sings. 

36. Shows dissatisfaction—Frowns, cries. 

37. Shows indefinite emotional expressions—Shows indefinite expressions, shouts, 
etc. 

38. Commands—Speaks imperatively. 

39. Decides—After disagreement child’s solution immediately precedes agreement 
or action. E 

40. Exclaims—Shouts short, loud statements, or expressions like ‘‘ah,” “oh,” etc. 

41. Declares—Makes declaratory statements. 

42. Requests—Ask for something. 

43. Repeats self—Says exactly what he has said a moment before. 

44. Suggests—Makes expressions of “let us,” “suppose,” etc. 

45. Questions. 

46. Whispers. 

47. Agrees—Acknowledges acceptance of another's act. 

48. Uses Other Speech—Includes unclassifiable speech. 

49. Expresses Individuality—Says “I” and “me.” 

50. Expresses Group Solidarity—Says “we” and “us.” 

51. Others are Inactive—Others are definitely stated as inactive. 

52. Is Obeyed—Others do what child says do. 

53. Is Yielded to—Others cower or give up what they have when child attacks. 

54. Is Followed—Others do what child has just done. 

55. Is Withdrawn From—Others avoid when child acts toward. 

56. Is Ignored—Others do not change when child acts toward or near to. 

57. Is Repulsed—Others maintain their physical position when child attacks. 

58. Is Differed With—Is disagreed with in speech. 

59. Is Approached—Others move toward or speak to without locomotion. 

60. Is Attacked—Is hit, pushed, jerked, etc. 

61. Is Taken From—Others take without asking for. 

62. Is Given—Others offer, or do something by which child is benefitted.‘ 


The results of the classification of the behavior recorded in 
the 28 five-minute observations of this experiment are presented 
in Table II. 

The frequencies of recorded acts falling in the various cate- 
gories are given in rank order, according to the average for all 
three observers. By reference to the category number one gets a 
general impression as to what types of behavior were most fre- 
quently noted and what types rarely noted. In fact, no recorded 
behavior was classifiable under ten of the 62 categories. The ten 
which were eliminated from Table II because none of the three 
observers recorded any behavior which could be classified therein 
are categories 5, 9, 23, 24, 25, 49, 50, 53, and 58. 

It may be that the behavior which was most frequently noted 

*Mapheus Smith, A Study of the Behavior of Institutional Children in an Un- 


supervised Play Situation, Doctor's Dissertation, Vanderbilt University, 1931, pp. 
36-38. 











AGREEMENT OF OBSERVERS—RECORDING OF BEHAVIOR 639 


was the easiest to see and record. It may be that the behavior 
which was seldom or never recorded was either difficult to note 
or actually was rare. At any rate, Table II gives the classified 
distribution of behavior as noted by students who were not told to 
look for anything in particular—merely told to note what they 
saw and heard. The distribution is of significance only when the 
human equation of the observers is taken into account. Conse- 
quently, it only imperfectly and partially represents the absolute 
quantum of behavior any observed child showed in a five-minute 
period. Undoubtedly if motion pictures had been taken during 
each period of observation, some idea, if not a measure, of the 
amount of behavior erroneously recorded or entirely missed could 
have been obtained. However, there is no reason to believe that 
the recorders misjudged or failed to note a large portion of a 
child’s behavior which would be significant in a social situation 
of the playroom. 

By way of caution, one should also note that the distribution 
in Table II cannot be taken to indicate anything representative of 
orphan children. In fact, only six different children were involved 
in the 28 five-minute periods of observation, which extended over 
twelve different days from January 27 to March 8, 1930. It 
might be the authors’ impression that the children in the play- 
room of this receiving home show a poverty of socially stimulated 
behavior, owing to lack of institutional facilities and organized 
direction and to impoverished family and community back- 
grounds; but such an impression cannot be superimposed on the 
recorded behavior of the six dependent children or of institutional 
children in general. In other words, the frequencies of the re- 
corded acts are used in this paper to tell something about three 
observers who observed simultaneously—not about the children 
who were observed. 

The coefficient of variability as given in the last column of Table 
II indicates the relative degree of variation among the three ob- 
servers for each category.° It may be seen that the percentage 


of variability tends to be lowest for the categories with the high- 
*The mean or average deviation was obtained for the three observers and di- 
vided by the average (mean) number of recorded acts for all three-—V=A.D. 


M 
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est frequencies of recorded acts and highest for those with the 
lowest frequencies. But as indicated above, the behavior falling 
under the categories with the highest number of recorded acts 
may be the behavior which is the most obvious or easiest to note. 

Intercorrelation of the frequencies of recorded acts, as classified 
according to the given list of behavior categories, yields coefficients 


TABLE II 


Number, Average, Per Cent and Variability of Acts Recorded by Three 
Observers, Classified by Behavior Categories 














Cate- No. Acts Recorded! Aver- Per “5 Per Cent Vari- 
gory I Ii Til age Av. I II ITI lability 





8 181 183 193 185.7 14. 
4 130 163 181 158.0 12. 
21 97 127 82 102.0 
ll 135 114 53 100.7 
15 82 77 78 79.0 
10 86 73 76 78.3 
1 69 72 61 67.3 
6 80 62 35 59.0 
12 58 66 43 55.7 
30 57 45 35 45.7 
2 46 39 45 43.3 
13 37 43 29 36.3 
35 43 26 36 35.0 
41 35 38 31 34.7 
32 29 20 29 26.0 
48 24 27 16 22.3 
7 18 25 20 21.0 
19 20 24 20 
14 17 17 17 1 
20 11 9 

26 10 16 

17 7 

38 12 

59 16 
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27 
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55 
34 
36 
57 
56 
47 

3 
29 
39 
44 
46 
51 
61 


Total 1,376 1,324 , 1,30 10 


of .98 + .006 (S.E.) for observers I and II, 92 + .021 for ob- 
servers I and III, and .95 + .014 for observers II and III. The 
high coefficients make it appear that there is a very close corre- 
spondence between the frequencies of noted behaviors (i.e., the 
recorded acts) as classified by categories for each pair of observers. 
Inspection of Table II reveals that the frequencies of recorded 
acts comprise a discontinuous series. There seem to be clusters 
of high and low frequencies. And it may be that the high fre- 
quencies pull up the regression line, resulting in a fictitiously 
high correlation. In order to test this out, correlations were calcu- 
lated for the first 26 frequencies which include the high values in 
Table II, the last 26 frequencies which contain the low frequencies, 
and the middle 26 frequencies. The findings were as follows: 
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First Last Middle 
26 Items 26 Items 26 Items 


II I 97.012 654.113 .92+.028 

Till I 89+.040 472.149 86+.051 

rill I 93+.26 68.104 81.067 
The coefficients of correlation for the combinations within the 
arrays of Table II are for the most part satisfactorily high and on 
the whole lend support to the validity of the original coefh- 
cients. 

One notices that the coefficients of correlation are much higher 
for the first than the last 26 items in Table II for any pair of ob- 
>XY-nM.M, 

> Ww 
V (X*-nM,") [ZY*-aM,?| 


the formula for the standard error of the coefficient of correlation was S.E.= 
1-r’ . 


VN 





*The correlation formula used was r = 
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servers. The first 26 items it should be recalled are the highest 
frequencies and show the smallest variability among observers. 
The last twenty-six contain the lowest counts and show the great- 
est variation among observers. 

The practical significance of the high correlations is that the 
three observers noted closely corresponding amounts of various 
types of behavior. If, for example, one observer had observed a 
child, according to the methods outlined above, for 28 five-minute 
periods, he would have obtained, it seems reasonable to suspect, 
a distribution of recorded acts fairly close to the distribution an 
equally competent observer would have obtained.’ 

If the field workers of the child placing agency, prior to placing 
a child in a foster home, wanted a statistical report on his behavior 
profile—to indicate what sort of individual he really is, could they 
rely on a profile made from a classification of an observer’s writ- 
ten recordings? It seems that they could, if the observer were as 
competent as any one of the three in the test team. The relative 
competency could be determined, we believe, by an analysis of the 
word counts in his written recordings (especially a low subjective 
adverb-adjective count), by his recording speed (the average 
number of acts per five minutes and the average number of words 
per act), and by the extent to which he brought motivation, in- 
ference or interpretation into the recorded acts. 

If we had adequate behavior norms for young dependent chil- 
dren, the report on the child’s behavior profile could indicate 
whether he was above or below the average in certain types of 
behavior. In fact if this type of observational method were used 
in receiving homes for study of children prior to placement, the 
categories could be revised to include the types of behavior signifi- 
cant for intelligent child placement. 

There are several methodological questions about an observa- 
tional method which relies on mere written recordings subse- 
quently classified by behavior categories. The fact that the written 


"Smith found that 20 five-minute periods of observation (i.c., 100 minutes), 
scattered over several days and at different times of the day, were ample to obtain 
a consistent frequency distribution of a child’s behavior in the playroom situation. 
See Mapheus Smith, “Concerning the Magnitude of the Behavior Sample for the 
Study of Behavior Traits in Children,” Journal of Applied Psychology, Vol. XV 
(1931), pp. 480-485. 











AGREEMENT OF OBSERVERS—RECORDING OF BEHAVIOR 643 


recordings do not determine in advance what an observer ob- 
serves is at once a weakness and a strength. One might say that 
too much reliance is placed on the observers. If the observers 
are not competent, the recordings suffer. Besides, observers with 
different training backgrounds might tend to note certain types 
of behavior and overlook other types. Did the test team’s observa- 
tions show close correspondence because all three members were 
students majoring in sociology and had been imbued with the 
same point of view regarding behavior in a situation? What 
sorts of behavior would, for example, a young psychiatric social 
worker have tended to emphasize and how would the category 
frequency of acts recorded by her compare with that of any mem- 
ber of the test team? Further study will have to indicate the extent 
to which written recordings are affected by differences in the 
theoretical and vocational training of the observers. 

Another weakness of the method is the unit of behavior. Ob- 
servers might record “bites fingernails” or “plays with blocks.” 
The former might be actually one act or a simple series of acts, 
while the latter might cover a complicated series of behaviors. 
Clearly they are not equal or identical units any more than a 
plum is equivalent to a bunch of grapes. But neither are farms, 
homesteads, families, households, and persons unchallenged units 
in census enumeration. 

While the acts in written recordings may not be comparable 
units in the logic of science, they may be defended as practical 
units on the grounds that they are the obvious things a child or 
individual was doing at the time he was observed. After all, 
there are no natural units of behavior in a situation. The visible 
acts, whether they are simple, compound, integrated or uninte- 
grated, are the important data for observation of behavior in a 
situation. The invisible elements of behavior, the so-called sub- 
jective aspects of behavior, important as they are, must be in- 
ferred, even though “tests” and other indicators are used. 

The strength (i.e., the good points) of the written recordings 
method might very well exist in the fact that the observers are 
not forcing behavior into arbitrarily conceived categories at the 
time of observation, thereby predetermining what acts must be 
observed and in what configurations. The visible acts become re- 
corded data. Anyone in the future can then devise any set of units 
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or categories by which to classify or analyze the data. If inter- 
views, to take an example from another field, are recorded in the 
original language of the dialogue, they become data, which anyone 
can analyze according to any system of concepts. However, if 
notations on a case are made in technical language at the time of 
interviewing, no one can ever get behind the concepts to the data. 

In regard to the use of the present list of catagories for classify- 
ing the recorded acts or data, could two or more competent per- 
sons obtain the same frequency distribution? We are not sure 
that the present list of categories is adequate, because many of 
the categories overlap. However, categories could be defined in 
such a way as to facilitate close correspondence in classification of 
behavior by two or more persons. 

Furthermore, a different set of categories would have yielded 
a different frequency distribution. But in all probability there 
would still have been a close correspondence between the classified 
act frequencies for the three observers. And what would have 
happened if the list of categories had been larger or smaller than 
the present list? It has been shown from this very type of written 
observations that a smaller and more inclusive number of cate- 
gories used for classification of recorded acts reduces the variation 
between observers and that a larger number increases the varia- 
tion.® 

Finally, the present method fails to connect the observed acts 
with time and space. When a child runs and is recorded as run- 
ning, we do not know how far he runs, or just what distance he 
traverses. When he plays blocks, we do know how long he 
plays before he turns to something else. These two factors would 
be very desirable in the behavior record, just as they were found to 
be important in Thomas’ floor plan technique and observation. 
But the floor plan technique had to sacrifice detail in order to get 
time and distance on the behavior record. If motion pictures 
were taken, one could indeed measure the duration of certain bits 
of behavior, but it is doubtful whether one could get distance. 
The point is that there is as yet no perfect method of observation 
of situational behavior. There are shortcomings in all existing 
methods which need to be better known and understood. 


*See Mapheus Smith, “The Agreement of Observers Concerning Groups of 
Behavior Traits,” Journal of Juvenile Research, vol. XV (1931), pp. 249-250. 








A BATTERY OF PERFORMANCE TESTS 
(THE ARTHUR SCALE REVISED) 


HARRY C. MAHAN 
Warren State Hospital 


INTRODUCTION 


LTHOUGH performance tests are almost universally used 
A» psychologists in clinical practice it is doubtful if any 
two clinics use exactly the same combination of tests or the 
same method of weighting scores. There are certain tests which 
have been found more useful than others and a number of these 
have been combined by Arthur (1, 2) into a Performance Scale. 
Although her scale was apparently well standardized on a sufficient 
number of cases her statistical methods of treating the results seem 
a bit too cumbersome for ready clinical application. The present 
study is an attempt to present her original data in a somewhat 
simpler and possibly more usable form. 

In the original investigation Arthur standardized two complete 
scales. The tests considered below were taken from the second 
of these: Form II. The criterion group for this scale consisted 
of 557 children with a mean chronological age of 9.4 years and 
a standard deviation of 3.15. The mean mental age on the Binet 
test was 9.7 years and the standard deviation 3.07. The subjects 
were “public school pupils of a good middle-class ‘American’ 
district” (2, p. 11). The data for the present paper were the 
raw scores on the tests as they are given in the original manuscript 
(2, p. v) on file in the University of Minnesota Library. 


STATISTICAL TREATMENT ! 


The first step in the analysis of the raw scores was to determine 
the degree of correlation between the various tests. The regres- 


*The use of Hollerith machines greatly facilitated the necessary calculations. 
This work was done by the writer at Ohio State University under the direction of 
Herbert A. Toops and H. A. Edgerton. 
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sions, somewhat to our surprise, were found to be linear, making 
possible the use of the Pearson r as a measure of correlation. There 
were, however, peculiarities in the distributions of scores on the 
Five Figure form board and the Ship Test and these two were 
dropped from further consideration. The final battery consisted 
of the following five tests: (1) Knox Cube, (2) Seguin form board, 
(3) Healy Picture Completion II, (4) Porteus Maze, and (5) 
Kohs Block Design. The intercorrelations between the various 
tests in the order numbered above are given in Table I. Variable 
6 is chronological age and variable 7 Binet mental age. The cor- 
relations with variable 6 partialled out are given in Table II to 
show the relationship of the tests when physical maturity, as 
measured by chronological age, is held constant. 

The method of weighting which gives an age score on the per- 
formance tests most closely approximating Binet mental age is the 
multiple regression equation. Mental age on the Binet is con- 
sidered as the dependent variable and scores on the performance 


TABLE I 


Table of Intercorrelations 























Variable No. 1 2 3 4 5 6 7 
Knox 1 1.000 .709 595 691 -619 639 735 
Seguin 2 709 = 1.000 727 766 -690 812 841 
Healy Il 3 595 727 ~=1.000 721 685 781 801 
Porteus 4 691 .766 721 1,000 -696 756 791 
Kohs 5 619 690 685 -696 1.000 .704 772 
Chron. Age 6 639 812 781 756 704 ~=1.000 918 
Mental Age 7 735 841 801 791 772 918 1.000 

TABLE II 
Table of Partial Correlations with Age Held Constant 

Variable No. 1 2 3 4 5 7 
Knox 1.000 424 120 -413 310 -486 
Seguin 424 ~=1.000 255 398 .286 413 


l 

2 
Healy II 3 120 255 1.000 319 .305 339 
Porteus 4 -413 398 319 1.000 352 374 
Kohs > 310 286 305 352 ~=—-:1.000 446 
Mental Age 7 -486 413 339 374 446 ~=—-1.000 
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tests as the independent variables. The Beta weight for each of 
the tests was found directly from the correlation coefficients by 
the Doolittle process as described by King (6). The d or raw score 


weights are found from the Beta weights by the formula bx, == px, 


These weights for the tests used separately and in combination 
are given in Table III. As the scores had been coded to facilitate 
the calculations the weights given are for the coded scores rather 
than for raw scores on the tests. The regression equation for 
estimating Binet mental age from scores on the five performance 
tests is as follows: 

.13(2X1) +.17(100—X2) +.18(Xg/5) +.06(2X4) +.10(X5/6)—9.07 


In using this method the range of the estimated scores is some- 
what lower than the actual range of scores on the dependent vari- 
able (5, p. 470). This narrowing of dispersion was eliminated 
by multiplying each of the above weights by ox;/Rz.:206. 

This gives the following equation: 
.13(2X1) +.19(100—X2) +.19(Xg/5) +.07(2X4) +.11(X5/6) —10.86 


A table to give the weighted score for each possible raw score 
can be set up very easily with the aid of either a Monroe calculating 


TABLE III 


Table of b Weights for Five Variables Used Separately 
and in Combination 








3 4 Constant 





1.7668 

—18.5885 

7.8427 

7.9342 

1023 9.1187 

2352 3485 —21.6009 
-1860 2272 2496 —12.9843 
511 1941 2148 0861 —11.3602 
1267 1725 1763 .0609 1023 ~9.0744 





machine or an adding machine. Table IV gives the weighted 
score for each possible raw score when all five tests are administered 
and the weights to be used when only the first four tests are 
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given will be found in Table V. In these tables the constant has 
been deducted from the weighted score on test 2 (Seguin). As 
Arthur did not allow for fractions of a year a half year was added 
to the constant before the table was set up. The methods used in 


setting up these tables are similar to those described in detail by 
Hull (5, p. 484). 


TABLE IV 
Table of Weighted Scores for Five Tests 








Knox Cube Seguin Healy II Porteus Maze Kohs Block 





Raw Raw 4 Raw Raw Raw 
Score Weight Score Weight Score Weight Score Weight Score Weight 





1.0 28 11 6.41 0- 4 19 4.0 53 0- 5 .00 
1.5 “ @ 622 5- 9 39 4.5 .60 6- 11 All 
2.0 56 13 603 10-14 58 5.0 67 17 22 
2.5 69 14 5.84 15-19 77 5.5 a3 23 34 
3.0 83 15 5.66 20-24 97 6.0 .80 29 45 
3.5 oF, BAT 1 ee... 16 6.5 87 30- 35 56 
4.0 111 17 5.28 30-34 = 1.35 7.0 93 36- 41 .67 
45 125 18 5.09 35-39 1.54 75 1.00 42- 47 .78 
5.0 139 19 490 40-44 1.74 8.0 1.07 48— 53 .90 
> He 153 20 471 49 1.93 8.5 1.13 54- 59 1.01 
6.0 167 21 452 50-54 2.12 9.0 1.20 60- 65 1.12 
6.5 180 22 4.33 55-59 2.32 95. 1a7 66-71 1.23 
7.0 194 23 4.14 60-64 2.51 10.0 1.33 72-77 ~=1.34 
75 2.08 24 3.96 65-69 2.70 105 1.40 78— 83 = 1.46 
8.0 2.22 25 3.77 70-74 2.90 11.0 1.47 84- 89 = 1.57 
8.5 2.36 26 3.58 75-79 3.09 115 1.53 90- 95 = 1.68 
9.0 250 27 3.39 80-84 3.28 12.0 1.60 96-101 1.79 
9.5 2.64 28 3.20 85-89 3.48 12.55 1.67 102-107 1.90 
10.0 2.78 29 3.01 90-95 3.67 13.0 1.73 108-113 2.02 
10.5 i ae 2 13.5 1.80 114-119 2.13 
11.0° 3.05 14.0 1.87 120-125 2.24 
11.5 3.19 14.5 1.93 126-131 2.35 
12.0 3.33 15.0 2.00 132-137 2.46 

15.5* 2.07 

16.0 2.13 

16.5 2.20 

78 2ar 

vo sae 





*Upper limit of Criterion group. 
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The chief advantages of using this type of table for performance 
tests are: (1) it estimates Binet mental age with the maximum 
degree of accuracy possible with any particular criterion group, 
(2) as the entire table can be copied on one sheet of paper the 
performance “mental age” can be determined from the raw scores 
in a few seconds, (3) the score on the performance test battery 


TABLE V 
Table of Weighted Scores for Four Tests 








Knox Cube Seguin Healy II Porteus Maze 





Raw : Raw f Raw Raw 
Score Weight Score Weight Score Weight Score Weight 





1.0 33 11 6.01 0- 4 .00 4.0 76 

1.5 50 12 5.80 5- 9 24 4.5 86 

2.0 .67 13 5.58 10-14 48 5.0 95 

2.5 84 14 5.37 15-19 71 5.5 1.05 

3.0 15 5.15 20-24 a 6.0 1.14 

3.5 16 4.94 25-29 1.19 6.5 1.24 

4.0 17 4.72 30-34 1.43 7.0 1.33 

45 18 4.51 35-39 1.66 7.5 1.43 

5.0 19 4.29 40-44 1.90 8.0 1.52 

Ae 20 4.08 45-49 2.14 8.5 1.62 

6.0 21 3.86 50-54 2.38 9.0 1.71 

6.5 22 3.65 55-59 2.61 9.5 1.81 

7.0 23 3.44 60-64 2.85 10.0 1,90 

7.5 24 3.22 65-69 3.09 10.5 2.00 

8.0 25 3.01 70-74 3.33 11.0 2.10 

8.5 26 2.79 75-79 3.56 11.5 2.19 

9.0 27 2.58 80-84 3.80 12.0 2.29 

9.5 28 2.36 85-89 4.04 12.5 2.38 

10.0 29 2.15 90-94 4.28 13.0 2.48 
10.5 30 1.93 13.5 2.57 
11.0* 14.0 2.67 
11.5 14.5 2.76 
v2 ie 15.0 2.86 
155° 2.95 

16.0 3.05 

16.5 3.14 

17.0 3.24 

17.5 3.33 





*Upper limit of criterion group. 
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can be directly compared to the score on the Binet test, which is 
not possible with any other method of weighting. 

The multiple correlation coefficients and the probable errors of 
estimate * for various combinations of tests are given in Table VI. 
Tables of weights for other combinations can be worked out by 
obtaining the Beta weights from the r’s by the Doolittle method. 
The means and standard deviations for each test are needed in 
writing the regression equation and these are given in Table VII. 
The method of coding used by the writer in the present study is 
explained in detail by Table VIII. 


TABLE VI 


Multiple Correlation Coefficients and Probable Errors 
of Estimate 








dent Independent Probable Coefficient 


variable varible(s) =< te 


X7(M.A.) X (Knox) 1.40 0 
X7(M.A.) X1(Knox); X2(Seguin) 1.04 .864 
X7(M.A.) X1(Knox); Xo(Seguin); X3 (Healy) 91 898 
X7(M.A.) X1(Knox); Xo(Seguin); X3(Healy); X4(Porteus) .89 .904 
X7(M.A.) Xi(Knox); X2(Seguin); X3(Healy); 

X4(Porteus); X5(Kohs) 84 913 








THE TRIAL GROUP 


In order to try out the battery as a measure of general mental 
ability the first four tests were administered to twenty male, adult, 
non-psychotic patients in the Warren State Hospital. The group 
consisted of thirteen mental defectives, three psychopathic per- 
sonalities, three alcoholics, and one unclassified case. The mean 
chronological age of the group was 34.4 and the mean Binet mental 
age 10.1. In giving the performance tests the Seguin and Porteus 
were not inverted as Arthur did with the criterion group and the 
Porteus was begun with the three year maze. 

The chronological, Binet, and performance ages for each patient 


*It must be remembered that these probable errors of estimate (and of course 
the R’s as well) are for the criterion group and may not be the same for any other 
group. 
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are given in Table IX, and the blank used for recording the scores 
is shown in Figure I. The size and nature of the group hardly 
warrants a detailed statistical analysis but it is interesting to note 
that the mean performance age is 1.4 years lower than the mean 
Binet mental age. This difference does not disagree with the 
results of previous studies made by Babcock (3) and Knight (7). 

Babcock (3, p. 534) using the Short Army Performance Scale 
found that the score on this scale fell further and further behind 
Binet mental age as the chronological age of her subjects increased. 
In her study, however, the defectives tended to score higher on 
the Army Scale in relation to mental age than the normals. 
Knight (7), using the Arthur Scale, found the mean IQ of 152 
white mental defective children to be 57.66 and the mean “Per- 


TABLE VII 
Mean and Standard Deviation for Each Variable 











Variable— 1 2 3 4 5 6 7 

Mean 12.8079 81.1831 7.4560 20.5530 5.7199 9.4057 9.7038 

Standard 

Deviation 3.6421 5.6682 4.4890 5.9347 6.0305 3.1464 3.0707 
TABLE VIII 


Table Illustrating Method of Coding Used with Each of the Five Tests 











Knox Cube Seguin Healy II Porteus Maze Kohs Block 
Raw Coded Raw Coded Raw Coded Raw Coded Raw Coded 
Score Score Score Score Score Score Score Score Score Score 
1.0 2 10 90 0- 4 0 4.0 8 0O- 5 0 
1.5 3 11 89 5- 9 1 4.5 9 6—- ll 1 
2.0 4 12 88 10-14 2 5.0 10 12— 17 2 
2.5 5 13 87 15-19 3 5.5 11 18— 23 3 
3.0 6 14 86 20-24 4 6.0 12 24 29 4 


10.0 20 28 72 80-84 16 16.5 33 120-125 20 
10.5 21 29 71 85-90 17 17.0 34 126-131 21 
11.0 22 30 70 90-95 18 17.5 35 132-137 22 
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formance Quotient” to be 59.37. She also tested a normal group 
of 85 and found the mean “Performance Quotient” to be 104.15 
as compared to a mean Binet IQ of 99.27. 


WARREN STATE HOSPITAL 
Psychological Laboratory 


PERFORMANCE TEST 


Nationality. . . WO a i Gs ce ese 
Language difficulty Reached grade at yrs. of age 
Birth-date Binet age 
Remarks Examiner 


Score Points Years 





1. Knox Cube Test (average of 2 trials) 


Ist 2nd Ist 2nd 
A 


B 
Cc 
D 





Seguin Form Board 
Ist Trial 
2nd Trial 
3rd Trial 





Healy P CII 
Picture Block Score Picture Block Score 
I VI 
II VII 
Ill VIII 
IV IX 
Vv xX 





Porteus Maze Test 





Kohs Block Design 





Total Points 





FIGURE I 


Record Sheet Used for Performance Test Battery 
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In considering the present group it must be remembered that 
it consists only of those individuals who had gotten into specific 
difficulties in the community and were thought to be mentally 
abnormal. In several instances where marked differences in scores 
are found a study of the case record may be enlightening. The 
following cases are cited not because conclusions can be drawn 
from them but because they may suggest a problem worthy of 
future study. 

Case No. 8 was a boy 20 years of age who scored 12.8 on the 
Binet and 9.6 on the performance tests. By every criterion except 
the Binet he was a low grade moron, having spent several years 
of his school life in the class for subnormals because of his inability 
to learn. He had always been considered queer and was sent to 
the hospital for observation because he had long been a general 
nuisance in the neighborhood and continued to turn in false 
fire alarms in spite of repeated warnings. Both the teacher of 
the orthogenic class and the psychiatrists who had studied his 
record believed him to be mentally defective. 

Case No. 12 was a farm hand, aged 28, with a Binet mental 
age of 12.6 and a performance age score of 9.5. He was sent to 
the hospital because he was considered to be eccentric and when 
he borrowed a neighbor’s horse without permission psychiatric 
observation was advised. He stated that his ambition was to go 
west and be a cowboy. He stuttered to a marked degree but 
said that he did this because he wanted to and did not care 
to have help in attempting to rid himself of the habit as he liked 
to stutter. 

Case No. 14 was a psychopathic personality of long standing. 
His chronological age was 42, his Binet mental age 14.8, and his 
performance age score 11.7. He had gotten into various difficulties 
since early adolescence and had been a user of both alcohol and 
drugs over a period of years. 


SOME MINOR CHANGES IN SCORING 


In determining the linearity of regression between the various 
tests a rather interesting point in regard to the time score tests 
was brought out. In order to have linearity with these tests an 
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TABLE IX 
Binet and Performance Age Scores of Twenty State Hospital Patients 





Chronological Binet Performance 
Age Mental Age Age Score 
40 7.8 6.6 
46 7.9 5.4 
43 8.2 7.5 
41 10.7 6.9 
44 7.8 4.7 








44 12.4 11.8 
17 14.2 13.5 
20 12.8 9.6 
27 6.8 5.7 
1 24 8.3 8.1 


11 35 7.0 8.3 
12 28 12.6 9.5 
13 35 7.8 6.4 
14 42 14.8 11.7 
15 43 12.7 11.5 


16 45 9.5 9.3 
17 19 8.5 9.1 
18 35 7.2 3.5 
19 26 11.3 10.7 
20 36 15.8 13.8 





optimal time limit must be set and all subjects taking longer than 
this are scored with the maximum time allowed. In other words, 
the use of a time score above the optimal limit actually decreases 
the potency of the test. The optimal time limit for the Seguin 
form board is 30 seconds, and for the Five Figure form board 
(which was not included in the final battery) 2 minutes and 40 
seconds. 

In scoring Healy P. C. II Arthur has apparently used a method 
which differs slightly from that devised by the author (4). Al- 
though she makes no mention of it in either The Clinical Manual 
(1) or The Process of Standardization (2), it appears from her 
table (1, p. 74) that where Healy (4, p. 231) uses a negative 
score of —5 for individual pictures she uses a score of zero. The 
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elimination of negative scores is highly desirable when possible 
and this change probably does not greatly alter the value of the 
test. 


SUMMARY 


An attempt is made to treat statistically the performance test 
scores made by 557 children, ranging in age from 5 to 15 years, 
who were tested by Arthur and her coworkers. 

A table of intercorrelations between the Knox cube test, Seguin 
form board, Healy P. C. II, Porteus maze test, Kohs block design 
test, and Binet mental age is presented with chronological age 
both included and partialled out. 

Optimal weights for each test used separately and for com- 
binations of from 2 to 5 tests are given. 

Weighted scores for the total battery are given. They have 
also been worked out for all possible raw scores when only the 
first 4 tests are used. The sum of these weighted scores gives the 
“performance age” which most closely approximates Binet mental 
age. 

The application of the test battery to a trial group of twenty state 
hospital patients is described. 
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EIGHTH INTERNATIONAL PSYCHOTECHNIC CONGRESS 
(Continued from page 618) 


lands, psychology needs for its practical work a supreme authentic 
law in order that its work be free from a too controlling influence 
of a contemporary philosophy of life. 

Professor Spearman (London) reported on the recent attempts 
to investigate human personality, at least in its psychotechnical 
aspects. The tendency prevails at present to understand per- 
sonality by means of some lesser traits of character. In co- 
operation with other investigators the speaker has undertaken 
numerous researches to discover methods and findings of the 
already existing experiments. This joint work can be successful 
only if it results in the mutual understanding and cooperation of 
international psychologists. , 

The further work of the Congress was divided into several 
sections: industrial, vocational, commercial, statistical, and 
pedagogical. This last was really meant for the Czech teachers. 
The results of individual researches were especially reported. 
The commercial section offered many things of interest. Vana 
and others (Prague), Laugier and Weinberg (Paris), Heinis 
(Geneva), Ponzo (Rome), and others reported on chauffeur 
tests. In spite of many conscientious studies in this field we have 
not succeeded in ascertaining which abilities and traits are de- 
cisive for the successful guiding of automobiles. The informa- 
tion of the Polish Wojciechowski about the tests on railroad 
personnel was challenging. The examination methods which 
are approved in Warsaw proved worthless in Posen. The regional 
confirmation of these tests is yet, therefore, a new and no small 
undertaking. One of the reasons for the limited application of 
psychotechnic to train traffic is, according to Miles (London), 
due to the conflicting attitudes of the practitioner who would 
like to have the test the quickest and cheapest possible in contrast 
with that of the scientist who would work slowly and thoroughly. 
Traffic accidents, Lahy and Korngold (Paris) conclude, are 
caused in many cases by the incapacity of the worker to submit 
his responses to the required rhythm of the work. Mizzi (Milan) 

(Continued on page 695) 








THE EFFECT OF APPROBATION AND 
REPROOF ON THE MASTERY OF 
NONSENSE SYLLABLES 


THEODORE W. WOOD 


University of Cincinnati 


NTRODUCTION :—Studies on the use of approbation and 
| reproof as incentives for learning have been carried on in 
various institutions. The results that have been obtained con- 
flict to no small degree. 
Gilchrist * in his study, The Extent to W hich Praise and Reproof 
Affect a Pupil’s Work, in an experiment carried on with fifty 
college students, states in the summary of his work that: 


(a) The group that was praised improved the group score 79 
per cent. 

(b) The group that was reproved made a lower group score 
on the second test than on the first. 

Gates and Russland * in a study, The Effect of Encouragement 
and Discouragement Upon Performance, in color naming and 
coordination tests, made the following statements in reference to 
their results: 

(a) The percentage of individuals who improved is in both 
tests higher for the encouraged than discouraged groups, and 
higher for the latter than the repetition groups. 

(b) The small percentage of improvement under encourage- 
ment, 9 per cent in one test and .007 in the other, contrasts with 
the 79 per cent found by Gilchrist; similarly the improvement 
found under discouragement 6 per cent and .01 is in disagree- 


*Gilchrist, E. The Extent to Which Praise and Reproof Affect a Pupil’s Work. 
School and Society. Dec. 2, 1916. Vol. IV. Pp. 872-4. 


*Gates, G. S. and Russland, L. O. The Effect of Encouragement and of Dis- 
couragement Upon Performance. The Journal of Ed. Psyc. Vol. XIV, No. 1. 
Jan., 1923. Pp. 21-26. 
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ment with the deterioration of 6 per cent found by the former 
writer (Gilchrist). 

Hurlock * in her study, The Value of Praise and Reproof as 
Incentives for Children, issues the following statements as part 
of her conclusions: 


(a) That praise and reproof are incentives which may be used 
effectively as motivators for school work and that on the whole 
they are of equal value. 

(b) That older children respond more to both praise and re- 
proof than do younger ones. Praise is slightly more effective for 
older children and reproof for younger. 

PURPOSE OF EXPERIMENT :—With the knowledge of the 
results obtained on the three above-mentioned studies, the author 
of this paper carried on an experiment using thirty college stu- 
dents. The students selected were mainly from the upper two 
classes and the graduate level. They were divided into groups 
of ten, thus constituting one control, one approbation and one 
reproof group. 

The learning situation or test in the first experimental pro- 
cedure for the three groups involved the mastery of the following 
nonsense syllables which were equated for difficulty—KEX, SIG, 
ZEP, BAB, JID, DEJ. During the second part of the experi- 
mental procedure, the following nonsense syllables were used on 
all three groups: LUP, GEC, MAZ, VUV, NEN, JAX. 

The following values were assigned in the scoring of the tests 
of nonsense syllables: 


(a) Correct reproduction and order—5 

(b) Correct reproduction and out of order—4 

(c) Two letters right—3 

(d) Reverse order of syllables—2 

(e) No reproduction—0 

In this experiment the author sought to obtain some objective 
indicators of the emotional changes which transpire while using 
approbation and reproof as incentives for learning. With this in 
mind the psychogalvanometer was used on both tests for every 





*Hurlock, E. B. The Value of Praise and Reproof as Incentives for Children. 
Archives of Psyc. July, 1924. Pp. 71-78. 
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subject and the readings carefully recorded. A fifteen second ex- 
posure was allowed for each nonsense syllable. 

PROCEDURE:—During both tests the subject was seated in an 
armchair with galvanometric connections and facing a memory 
drum which was inserted in a large cardboard frame to shield the 
subject from view of the experimental procedure. 

For all groups in the first test the following directions were 
given to the subject: 

“You are to be presented with a series of nonsense syllables. 
They will appear singly at the window of the memory drum at 
regular intervals. You are to remember each of the nonsense 
syllables in order of their,appearance. Use whatever device (such 
as repeating each one aloud) that will facilitate your mastery of 
these syllables, as you are going to be required to reproduce them 
later on.” 

For the control group in the second test the following directions 
were given: 

“We are repeating the previous test to verify their results using 
different nonsense syllables. The directions for the present test 
will be exactly the same as the directions for the previous one.” 

The remaining two groups were respectively approbated and 
reproved in the following manner— 


FOR THE APPROBATION GROUP 


“Mr. or Miss (So and So), we have examined your reproduc- 
tion ,of the nonsense syllables presented the last time you were 
here. We have found your memory of these nonsense syllables 
to be decidedly superior to the average. Your memory score tends 
to indicate your degree of intelligence. We are giving you this 
additional test in the hopes of bearing out the findings of the first 
test.” 


FOR THE REPROOF GROUP ' 


“Mr. or Miss (So and So), we have examined your reproduc- 
tion of nonsense syllables presented the last time you were here. 
We have found your memory of these syllables to be decidedly 
inferior to the average. Your memory score tends to indicate 
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your degree of intelligence. We are giving you this additional 
test in the hopes of your demonstrating a better memory of the 
presented nonsense syllables.” 

Immediately after the final test had been given each subject, 
introspections were made to ascertain— 

(1) What the subject thought the test was given for. 

(2) The method used in learning the nonsense syllables. 

(3) The emotional effect of approbation and reproof on the 
subject as compared with the first test when neither incentive 
was used. 

(4) Whether the subject thought the approbation or reproof 
was genuine. 

RESULTS:—-The following tables for each group show: 

Firstly—The results of the two tests for each group in terms of 
average group scores and average galvanometric deflections ac- 
cording to Winches Equal Grouping method. 

Secondly—The results obtained after statistical treatment of 
group averages to ascertain the P. E. in each group test record and 
the P. E. of the difference between the two tests for both scores 
and galvanometric’ deflections. 




















Control Group Control Group 
First Test Second Test 
Av. Av. 
Av. Group Av. Group 
Galv. Galv. Galv. Galv. 
Sub- Av.Group Defiec- Deflec- Sub- Av.Group Defiec- Deflec- 
ject Score Score tion tion ject Score Score tion tion 
A 28 28 12 12 A 20 20 6.83 6.83 
B 23 14.17 B 21 9.17 
Cc 23 9.50 Cc 24 8.33 
D 22 9.33 D 13 12.50 
E 21 22.25 6.17 9.80 E 21 19.75 12.67 10.67 
F 18 18.33 F 30 15.33 
G 18 18 917 13.75 G 71 25.50 11.33 13.13 
H 16 7.83 H 16 11 
I 15 15.5 27.17 17.50 I 19 17.50 11.83 11.47 
J 12 12 8.17 8.17 J 18 18 14.67 14.67 
Av. Total Av. Total Av. Total Av. Total 
19.6 Av. Gr. Galv. Av. 20.3 Av. Gr. Galv. Av. 
Score Deflec- Gr. Score Deflec- Gr. 
95.75 tion Galv. 100.75 tion Galv. 
12.184 Deflec- 11.366 Deflec- 
tion tion 
61.22 56.77 
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According to Winches Equal Grouping method the reproof 
group improved 26.39 per cent while the approbation group only 
improved 17.50 per cent over the first tests. The reproof group 
showed an average group galvanometric deflection .28 m.m. 
higher; while the approbation group showed an increase of 
0.59 m.m. higher in the second test than in the first test. The 








Approbation Group 


Approbation Group 





























First Test Second Test 
A 24 22.17 A 23 9 
B 24 8.67 B 28 10 
Cc 23 16.33 Cc 30 19.33 
p 2 7 267 1689) 5 39 (285 gya7 = (1739 
E 22 9 E 30 14.33 
F 22 16.50 F 30 20.50 
G 20 8.00 G 25 15.67 
H 20 20 33 4.17 H 18 21.5 1.50 8.59 
I 15 22.67 I 28 16.83 
| 14 14.5 8.33 15.50 J 22 25 5.50 11.17 
Av. Total Av. Total Av. Total Av. Total 
20.7 Av. Gr. Galv. Av. 26.4 Av. Gr. Galv. Av. 
Score Deflec- Gr. Score Deflec- Gr. 
57.50 tion Galv. 75 tion Galv. 
14.07 Deflec- 14.83 Deflec- 
tion tion 
36.56 37.15 
- = Group Reproof Group 
irst Test Second Test 
Av. Av. 
Av. Group Av. Group 
Galv. Galv. Galv. Galv. 
Sub- Av. Group Deflec- Deflec- Sub- Av. Group Defiec- Deflec- 
ject Score Score tion tion ject Score Score tion tion 
A 30 6.67 A 30 3 
s -o * go "i sp 2 @ waz 
Cc 24 15.83 Cc 22 10.33 
D 23 22.33 13.83 12.50 D 28 25 16.50 12.17 
E 20 7.83 E 25 9.67 
F 18 17.33 F 17 19 
G 18 4 14.33 G 30 11.50 . 
H 18 17.78 10.33 12.92 H 17 22.5 18 15.83 
I 17 9.67 I 26 14.83 
J 10 10 12.17 12.17 J 30 30 8 8 
Av. Total Av. Total Av. Total Av. Total 
20.8 Av. Gr. Galv. v. 25.3 Av. Gr. Galv. Av. 
Score Deflec- Gr. Score Deflec Gr. 
80.11 tion Galv. 106.5 tion Galv. 
11.768  Deflec- 12.80 
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control group showed a 5 per cent increase in memory score, 
probably due to practice effect, while galvanometric deflection 
difference between the first and second tests showed a surprising 
increase of 4.45 m.m. This, as seen, is a much greater difference 








Av. of Galvanometric 
Defiections with Standard 























Deviations, P.E., and P.E. Diff.4 
lst Test 2nd Test Difference 
Av. 12.184 Av. 11.366 P. E. Diff. 1.16 
Con- S. D. 4.90 S. D. 2.52 Diff. 818 
trol PE. 1.04 P. E. 53 (Lower on 
2nd test) 
Av. 14.067 Av 14.283 P. E. Diff. 1.16 
Appro- S$. D. 8.02 Ss. D 7.06 Diff. 216 
cation P.E 1.70 PE. 151 (Higher on 
2nd Test) 
Av. 11.768 Av. 12.80 P. E. Diff. 2.27 
> S. D. 3.31 s.D 4.80 Diff. 1.03 
= P.E .702 P.E 1.02 (Higher on 
2nd Test) 
Av. of Scores with 
Standard Deviations, P.E.’s and 
E. of Difference 
ist Test 2nd Test Difference 
c Av. 19.6 Av. 20.3 Diff. 0.7 
non Ss. D. 4.45 $. D. 4.34 P. E. Diff 
a .948 P. E. 925 1.327 
A Av. 20.7 Av. 26.4 Diff. 5.7 
cles S.D. 3.18 S. D. 4.04 P. E. Diff. 
P. E. .678 P. E. 861 1.09 
R Av. 20.8 Av. 25.3 Diff. 4.5 
osenl S. D. 5.82 S.D. 4.79 P. E. Diff 
P. E. 1.241 P. E. 1.021 1.606 





“In obtaining the 


S.D., P.E., and P.E. Diff. the following formulas were used: 
=D 
8. D. ore = \ <a 


67450 


P. E. M. 


a). a 





P. E. Diff.= y(P. E. Av.:)? + (P. E. Av.;)? 
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than that which appeared in the approbation and reproof groups. 
This fact is attributed by the author to the “suspense element” 
which may be induced by a lack of knowledge of why their 
efforts were being repeated and expended. 

The results using this method of grouping seem to indicate 
that reproof is a better incentive than praise and that there is 
very little difference in the emotivity of the individuals when 
either incentive is used. 

This should not be taken as final in regard to the emotional 
status of the individual under the influence of these incentives. It 
is to be remembered that there is a possibility of an “intellectual 
feeling” in response to a situation. Bleuler has shown that such 
“intellectual feelings,” which are memory processes, are utterly 
different from affectivity. It is only “affectivity in the strictest 
sense” * that has definite effects upon body and mind. The gal- 
vanometer measures only true affective processes, a part of which 
is the skin galvanic response. 

It might be further added that there was some difficulty in 
technique. Often the response was retarded and therefore not 
fully recorded by the galvanometer because it was necessary to 
present the next nonsense syllable. The rate at which the syl- 
lables were presented, every fifteen seconds, did not permit the 
skin galvanic response to be fully recorded. 

Statistical interpretation of the average scores and galvanometric 
reading of the approbation and reproof groups show that approba- 
tion increased the average score by 5.7 with a P.E. Difference of 
1.09, while reproof increased the average score 4.5 with a P.E. 
Difference of 1.606. In the control group the average score was .7 
higher on the second test than on the first, with a P.E. Difference 
of 1.327 which renders this data of little value. The small increase 
that was made under the second control test could no doubt be 
attributed to the practice effect. 

The P.E. Difference of the average galvanometric deflections 
of each group in each case was larger than the differences them- 
selves, which would seem to indicate that the possible errors which 


*Dr. L. Binswanger. The Psychogalvanic Phenomenon in Associated Experi- 
ments. Jung’s “Studies in Word Association.” 
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were mentioned previously played an important rdle in making 
this part of the data of little value. 

Introspection revealed that the majority of students thought 
that: 


(1) The test was either one of memory or intelligence. 

(2) The methods used in learning nonsense syllables were 
either by association with words having similar beginnings or by 
constant repetition. 

(3) The use of approbation made seven of the ten subjects feel 
calmer and more confident in the second test, while reproof made 
six of the subjects want to make a better score in the second test; 
four subjects admitted that reproof at first made them feel less 
confident and nervous. 

(4) The use of approbation was thought by five of the subjects 
to have been genuine. Three more in the approbated group 
thought it was some method of encouragement, while two thought 
the approbation was either a psychological trick or false. Eight 
of the subjects in the reproof group thought that the reproof 
was genuine while one was amused and another claimed indif- 
ference.* 


CONCLUSIONS 


(1) That approbation and reproof are of practically equal value 
as incentives for learning among college students. 

(2) That galvanometric phenomena may be of value in ascer- 
taining the accompanying emotional states with praise and re- 
proof under proper experimental technique. 

(3) From this study the so-called Winches Equal Grouping 
Method was found to be an unreliable technique when compared 
with statistical evidence. 


*The different reactions of subjects to the same objective form of reproof and 
approbation suggest an inevitable difficulty in experiments of this kind. What 
is intended as praise, may be uncritically accepted as such by some, calmly analyzed 
by others, or actively resented by a third group, resulting in a wide variation of 
the motivational intensity in different members of the same group. 





THE OPTIMUM LENGTH OF ADVERTISING 
HEADLINE 


D. B. LUCAS 
New York University 


HE optimum length of advertising headline has been a sub- 

ject for debate ever since psychologists first approached the 

field of advertising. Definite statements of principle have 

been laid down by some psychologists’ and advertising theorists, 

restricting the correct headline to four or five words. The basis 

for this principle is found in laboratory studies of attention and 

memory. The subject matter of the tests has seldom included 
headlines of actual advertisements. 

Advertisers have protested against some of the rules which they 
credit to psychologists, and many violations are found in practice. 
Writers of headlines have placed more emphasis upon the ad- 
vertised idea than upon the number of words required to express 
the idea. The authors of recent psychologies of advertising (since 
1930) have apparently realized that advertising practitioners are 
making no attempt to keep headlines within the four word limit. 
Lacking positive, convincing evidence as to optimum length of 
headline, most of the writers have preferred to avoid making 
specific or even useful recommendations. 

The following report is a summary of the results of three studies 
bearing on the optimum length of advertising headline. The 
ability of the reader to remember the headline under varying condi- 
tions was used as the criterion of effectiveness. Scores for test 
headlines which were secured by three widely different methods 
were compared with the length of the headline, as measured in 
different units. The closeness of the relationship between the 
shortness of a headline and the ease with which it may be recalled 
by a reader will be made more clear through a discussion of the 
general procedure followed in testing. 


*Notably, Starch, Principles of Advertising, (1923) p. 494. 
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GENERAL PROCEDURE 


Length of the headline was the controlled variable in the tests. 
Scores for success in recalling the headlines were obtained by three 
methods. The advantages peculiar to each method will be enu- 
merated in the process of evaluating the results. Preliminary 
studies of the influence of length of headline were made by the 
writer in 1930. The purpose of this first work was to establish 
an optimum length. Actually, the tests and results served only 
to refine the technique, to uncover difficulties and to determine 
limits for exposure and recall. 


TEST MATERIAL 


The headlines chosen for the 1930-’31-’32 tests were selected al- 
most entirely from issues of the Saturday Evening Post for the 
year 1929 and early 1930. The same group of test headlines was 
used throughout the study. Headlines ranging from 4 words up 
to 15 were found to cover a desirable test range. Headlines of 4 
words or less were recalled immediately after one reading by all 
subjects. Less than one subject in a hundred could read and recall 
any headline perfectly if it contained more than 15 words. Each 
headline was set in a uniform size and style of type face, so that 
no one word was emphasized more than the rest. Care was used 
to select headlines which were not descriptive of the illustrations 
which they accompanied. All were taken from full-page adver- 
tisements. The test headlines are listed in the table. 


SUBJECTS 


All of the subjects were students in psychology of advertising 
in the School of Commerce of New York University. While 
they were quite generally “advertising conscious,” they were not 
aware of the techniques being used for each test until after 
the whole investigation was completed. There were about equal 
numbers of day students and evening students. A total of more 
than 100 subjects, 23 women and 98 men, worked on the tests. 
There was a slight interchange of subjects in the tachistoscope 
test owing to absences from class. The resultant irregularity in 




















THE OPTIMUM LENGTH OF ADVERTISING HEADLINE 667 


the number of subjects for each headline is accounted for, as 
will be shown in the table. 


METHODS OF TESTING 


The first technique to be described concerns the measures of 
the length of a headline. Headlines which contain the same 
number of words may differ in number of syllables or letter spaces, 
in linear measurement and in ease of comprehension. It was 
suggested that a personal estimate of length might be made in 
units which were designated as “thought units.”* This term 
was not defined and was used merely to suggest an arbitrary 
basis for ascribing numerical values to judgments. A separate 
group of subjects was given the tested headlines and asked to 
rate each one in whole thought units. Their estimates for each 
headline were averaged and compared with ratings made by the 
experimenter. The correlations (r) for this study and for simi- 
lar succeeding studies were consistently above +.90. The com- 
posite score of the group is used in the table, since the average 
length has the advantage that it can be carried out to decimals. 
The purpose of this accessory study was to learn whether thought 
units would prove to be a more practical measure than are the 
more objective units in common use. 

The techniques upon which the three sets of memory scores 
are based will be described in order, followed by a summary tabu- 
lation of the results. In Test 1 forty-five subjects were provided 
with envelopes containing the test headlines on separate slips. 
At a given signal they were asked to remove the slips from the 
envelopes and to read each of them once in whatever order they 
occurred. As each finished he was asked to return the slips to 
the envelope, at the same time removing a blank sheet of paper. 
At the end of four minutes all of the subjects had read each head- 
line. They were then allowed five minutes in which to write 
down as many of the headlines as they could remember. The 
number of times each headline was recalled is shown in column 


*The term “thought units” has been used by others, usually without clear 
definition, but sometimes compared with the number of “emphasized words” in 
a sentence. 
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1 of the table. In this one test the experimenter found it necessary 
to make a liberal allowance for errors because the number of 
perfect reproductions was so small. 

A box tachistoscope was used in the second test for exposing 
the material individually to a group of more than fifty subjects. 
The preliminary studies had established an exposure of 1% sec- 
onds duration as satisfactory. The advertisements were presented 
in a random order. Each was exposed individually before each 
subject. Each student recorded the headline displayed immedi- 
ately after he had seen it. The conditions of this test were not 
hard to control. The scores, which are the percentages of perfect 
reproductions for each headline, are shown in column 3 of the 
table. Percentages are used because of the slight variation in 
number of subjects referred to above. Each score is based on the 
number of subjects who saw that particular headline. Errors in 
punctuation and in the use of capitals were not taken into account. 

Four weeks after the completion of Test 2, the subjects who 
had seen all of the headlines were given a special exercise. They 
were provided with a list of the tested headlines together with a 
set of the test scores. They were then asked to compute the cor- 
relation between the scores for recall and the length of each 
headline. This problem was turned in to the instructor along with 
the data sheets. 

Six weeks later the subjects who had worked the correlation 
problem were provided with envelopes containing slips on which 
were printed the tested headlines. An equal number of con- 
temporary headlines, selected at the same time and under similar 
conditions, was inserted. The students were asked to sort out the 
headlines which they had seen in the class tests and exercises and 
to replace them in the envelopes. These scores, representing 
Test 3, are recorded as raw scores in column 2 of the table. While 
the scores are low there were very few incorrect headlines re- 
turned to the envelopes. There was no reason to suspect the 
subjects of guessing, and guessing should not have influenced the 
relative scores. For these reasons the correction formula was not 
applied. All of the tabulated scores, together with the computed 
correlation coefficients (Pearson product-moment “r”), are showa 
in the following table. 
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Summary of Recall Scores For Advertising Headlines 
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Even the clothes you buy are better because 
of Wyandotte 

What other gift could possibly bring so much 
of lasting pleasure! 

Millions recognize the superior flavor .. the 
richer nutrition in Quaker Oats 

When dull film covers teeth smiles lose fascina- 
tion 

Heat alone is not comfort 

A happy thought .. the Sampler! 

As the great limited speeds through the night 
The continent that became a neighborhood 
Correct time is on tap at your light socket! 
Before he could pronounce it he used it 
How many doors do you shut when guests 
arrive? 

Put this Kelvinator freezing unit in your re- 
frigerator 

“It put my little girl on her feet again!” 
“RCA Radiotrons bring out the full tone 
beauty” 

Is beauty worth safeguarding? 

Why ordinary beauty treatments fail 

Buy a used Buick and enjoy the thrill of a 
big, powerful car 

Are your taxes buying the pipe that lasts a 
century? 

Children especially require healthful vegetable 
foods! 

Public preference chooses the inimitable 
“Chrysler 60” 

One treatment with Whiz cured our car of 
“Rheumatism of the gears”! 

The new Oldsmobile is the lowest priced car 
with the syncro-mesh transmission 

New Orleans stays up to eat .. and gives me 
a grand fish story 

This cleansing foam gives teeth an extra pro- 
tection 

Down from Canada came tales of a wonderful 
beverage 

Four girls speak out in meeting about com- 
plexion 

A ripple of good will that became a wave 
of preference 

An old beauty secret 

A girl can’t be too careful 
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Summary of Recall Scores For Advertising Headlines—continued 
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See how Super Suds goes to work below the 
water line in your dishpan 





4 8 49 3.0 10 The greatest value ever offered in a fifty dollar 
suit 
3s. & 2.3 5 Walnut days are here again 
?(B- & 1.5 4 Front drive is safer 
3.0 6 Good looks—fine quality—thrilling perform- 
ance 
2 2.8 8 A thrifty mate for cars that squander gasoline 
0 2.0 6 The packers know a good investment 
0 2.2 6 Swift death to germs of disease 
m~l «m 2.3 6 Roads that grow as you grow 
% 6 g 2.3 7 “Tust tell me two things about paint” 
e4 = 1.9 6 A clean house in every package 
2 0 3 1.8 6 A big help in driving tests 
ee 2.5 8 Where the horse gives way to an Exide 
S 4 ~ 2.0 6 We went wool-gathering in England— 
0 2.0 7 The same good car through many seasons 
2 2.3 7 That people may live and be happy 
a b c d ¢ Correlations(r) and Probable Errors 
Correlation of “thought units” with Words............. Tr, = +.88 +.02 
Recall (uniform exposure in tachistoscope) with “thought 
ET... 6 sv bs nog ody nc nanidde ease cehbh ss 4200 r, = —85 +.03 
Recall (uniform exposure in tachistoscope) with Words..... 7. = —.80 +.04 
Correlation of Later Recognition with “thought units”... ... r, = +.16 +.12 
Recall of Slips (each read once) with “thought units”... .. Toa = —.19 +.11 





RESULTS 


There is a high positive correlation (r = +.88 +.02) between 
the estimated thought units and the number of words in each 
headline. The subjective estimate of the length of a headline 
is usually, then, a function of the number of words. There are 
some clear exceptions. This close relationship is further shown 
by the near equality of the negative correlations which follow in 
the next paragraph. 

The scores for recall secured with the tachistoscope correlate 
highly and negatively (r = —.85 +.03) with the length of the 
headlines in thought units. The correlation is similar, though 
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lower (r = —.80 +.04) when length in words is substituted. 
Because of this slight difference in favor of the measure in thought 
units, the correlations which follow are presented on this same 
basis. 

There is a low, negative correlation (r = —.19 +.11) between 
the recall of slips: (Test 1) and the length of the headlines in 
thought units. While each subject read the slips in a random 
order, it should be noted that certain headlines were favored by 
recency in the case of each subject. There is some reason there- 
fore to believe that this and other uncontrolled factors may have 
reduced the coefficient below the real correlation. 

The correlation between the scores for recognition (Test 3, 
column 2) and thought units is low and positive. (r = +-.16 
+.12). This correlation, even though it may actually be lower 
than the true relationship, seems of doubtful significance. 


INTERPRETATION 


The high negative correlation between the immediate recall 
scores for the tachistoscope test and length of the headlines favors 
the shorter headlines. This method of test is based on the assump- 
tion that those who thumb through magazines hurriedly give 


the same amount of time to the headline of each advertisement. 
In reality, it is probable that hasty readers leave many of the 
longer headlines unfinished, except when a high degree of inter- 
est is aroused. It would seem that, other things being equal, 
there is a maximum time limit for the reading of headlines by any 
particular person. It is entirely possible that there is a tendency 
to spend slightly more time on headlines which require that 
maximum than on headlines of extreme brevity and simplicity. 

Test 1 offers an interesting comparison since, according to in- 
structions, each headline was read once. The low negative cor- 
relation obtained between recall and length of headline is only 
slightly favorable to the shorter headlines. One feature of this 
test which approached the normal advertising situation was that 
the subject was expected to retain the main idea expressed in the 
headline rather than the exact phrasing. The correlation is so 
low as to imply many exceptions to the rule that the idea expressed 
in fewer words is remembered better. It should be noted in con- 
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nection with this test that the subjects did not expect to be asked 
to remember the headlines. 

The low positive correlation between recognition scores and 
length of headline shows that length is not an important factor 
in determining the future recognition. Insofar as this correla- 
tion is significant it favors the longer headline. However, the 
advantages of long headlines in gaining recognition would apply 
in even greater degree to the text or to complete books. The 
functions and qualities of a good headline do not permit un- 
limited expansion. If the function of a headline is to carry a sell- 
ing idea and to leave a positive impression favorable to recall after 
a brief glance, it must accomplish that goal with a minimum of 
the reader’s time and effort. It is equally important that head- 
lines having the function of arousing interest in further reading 
should do so in the shortest time possible. 


CONCLUSIONS 


The following conclusions relative to length of advertising 
headline seem justified by the results of the study: 


1. The length of an advertising headline may usually be ex- 
pressed in number of words. The copy writer may learn to refine 
his judgment slightly by means of subjective estimates of ease of 
comprehension. 


2. Insofar as readers spend a uniform, brief period of time on 
each headline seen, brevity has a decided advantage. It is a matter 
for conjecture as to whether the shorter headlines were remem- 
bered better because they could be read so much more quickly 
and easily or whether the apparent advantage was all owing to 
other factors influencing ease of recall. 


3. It seems reasonable to assume that the length of time spent 
on a headline is actually influenced greatly by context and the 
length of the headline itself. Where a variable time of exposure 
was allowed (Test 1) nearly all of the advantage of brevity dis- 
appeared. 

4. Short advertising headlines are still to be desired within 
the limits dictated by the particular idea to be expressed. To the 
extent that the reader expends a uniform amount of time or sets 
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a maximum limit of time on each headline, brevity gains an 
advantage. 

This conclusion, which is really the most important outcome of 
the study of headlines, brings up the need for further evaluation 
of the three methods of investigation pursued. The scores for 
immediate recall after a uniform exposure in the tachistoscope, 
if taken alone, would place all emphasis upon brevity. An opti- 
mum length could be worked out easily upon such a basis. But 
each of the other studies has some merits which tend to cast doubt 
upon the validity of a test using uniform exposure. The scores for 
both the recognition test and for the test requiring the reading of 
the slips were based upon conditions more nearly applicable to 
advertising, although less under control. The fact that the in- 
fluence of length lost practically all significance in these tests can 
leave little doubt but that uniformity of exposure (as in the test 
using the tachistoscope) exaggerates the merits of brevity. A 
comparison of the scores shown in the table for individual head- 
lines, not attempted in this discussion, further substantiates this 
conclusion. Brevity may easily be gained at too great a sacrifice. 

5. Just how long a good headline may be is not shown clearly 
by the tests. However, the table shows that nearly nine-tenths 
(89 per cent) of the subjects remembered one headline containing 
nine words. The immediate recall scores for some of the shorter 
headlines were low and others were high, proving that brevity 
is only one of the factors favorable to easy recall. Since the sub- 
jects were all college students, these results could not be applied 
to average magazine readers without making due allowances for 
the more limited buying interests of students and the lower aver- 
age memory span of the population as a whole. 


FURTHER SUGGESTIONS 


A careful accounting of all of the factors involved in this prob- 
lem would require a volume. The following questions are sug- 
gestive of possibilities for future study: 

1. How much does concreteness of expression aid in the process 
of recall? 


2. What is the effect of other advertisements (competitors’ and 
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those of non-competitors) upon the influence of the individual 
advertisement ? 


3. How effectively may an apt illustration reduce the psycho- 
logical length or complexity of a headline? 
4. To just what extent does the importance of context of a 


headline nullify attempts to define an optimum physical or psy- 
chological length? 


5. Do the common practices of emphasizing important words 
and stepping down the type face of a headline by stages offset 
the disadvantages of an otherwise long headline? 














INDIVIDUAL DIFFERENCES IN 
PENITENTIARY SENTENCES GIVEN BY 
DIFFERENT JUDGES 


FREDERICK J. GAUDET, Dana College 
GEORGE S. HARRIS, New Jersey Law School 
CHARLES W. ST. JOHN, Dana College 


during the past decade. Most criminologists and penolo- 
gists believe that the results of their findings can properly 
be put into practice only under a new system of judicial adminis- 
tration. Many proposals have been made for the improvement of 
our administration of justice. It has been proposed that judges 
be used only to supervise the determination of guilt or innocence 
and that the sentencing be passed over to a sentencing board. 
Another suggestion is that every offender be given an indetermi- 
nate sentence and that the length or severity of the punishment 
depend upon the evidence of reformability which the prisoner 
shows under incarceration. Other suggestions such as discarding 
the jury system, changing the type of jurors selected, forming 
advisory boards to help the judge in sentencing, are frequently dis- 
cussed by criminologists, penologists, judges, lawyers, and lay 
groups. All of these suggestions for the reformation of the 
courts have intensified legislative action, as indicated in a state- 
ment by Professor Raymond Moley in his recent book “Politics 
and Criminal Prosecution.” He says, “A demand for reform 
in the administration of justice has acquired considerable momen- 
tum of late, and recent legislative sessions have ground out many 
laws amending criminal procedure. Sheer conjecture guided 
much of this legislative action. Little study was made as to the 
need for the laws enacted or the probable effect of them. Re- 
search into the fundamental conditions under which administra- 
tion of justice operates guides only a handful of states.” 
Of course, Professor Moley does not imply that there have been 
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no scientific studies of judicial administration, for it is probable 
that he has done more to popularize the results of such studies 
than any other individual in this country. He has been a con- 
sultant, or a more active participant, in most of the large and im- 
portant scientific studies of the administration of justice in the 
United States. But he does imply that the use made of these 
studies is sporadic and inefficient. 

The previous studies, such as those of the New York Crime 
Commission, the Cleveland Crime Survey, and the Illinois Crime 
Survey, have shown the differences in the sentencing tendencies 
in different parts of the state; for instance, the differences between 
sentences received for crimes when tried in the urban districts 
and the same crimes when tried in the rural districts. Other 
studies have examined the work of prosecutors. The judge, 
however, as one of the important factors in the judicial process, 
has not been so closely studied, although suggestions as to methods 
of modifying the judicial process have stressed changes at this 
point. An instance of this is the idea of former Governor Smith’s 
that sentencing should be done by sentencing boards instead of 
by the individual judge. 

The object of the present project is to study the judge as an 
important factor in the judicial process. One of the old constitu- 
tions said that we have “a government of laws, and not of men.” ! 
In the minds of many, this conception has been somewhat changed 
and the prevailing conception seems to be similar to that ex- 
pressed by our Chief Justice Hughes, namely, that “our govern- 
ment is one of laws, through men.” If this is true it is interesting 
to see how the laws, supposed to embody justice, are administered 
by these men. 

In a previous article,” the authors presented evidence to show 
that the product of the law depends to a certain extent upon the 
judge who administers it. It was shown that some judges are 
much more severe ia their sentencing tendencies than others. The 
comparison was based upon the kind of sentence given. It was 


The Constitution of Massachusetts of 1870. 


"Individual Differences in the Sentencing Tendencies of Judges. Journal of 
Criminal Law and Criminology, Vol. XXIII, No. 5, January-February, 1933. 
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shown that one judge gave 57.7 per cent of his sentences as jail 
sentences while another gave 33.6 per cent for the same crimes. 
Again one judge gave 15.7 per cent of his sentences in the form 
of suspended sentences while another gave more than twice as 
many. In other words, these laws in passing through the courts 
of these men give results that are quite dissimilar. These data 
show that if a prisoner is guilty of a certain crime and is sentenced 
by one judge he has 33 chances out of 100 of going to jail, but if 
sentenced by another judge he has 57 chances out of 100 of going 
to jail. Evidently, insofar as our laws which go to make up 
justice are concerned, the factor of “men” is a considerable one. 

The next project was to determine whether there were as great 
differences in the severity of sentences in each given category. 
The same data (judges, crimes, cases) were used in this study as 
in the previous one. These comprise all the sentences given by 
six judges for a certain specific group of crimes * over a period of 
nine years. These 7748 cases involved in the study were taken from 
the court records of one county in New Jersey. 

Since the rule is that there is no selection of the cases which 
the judge is to sentence, but that the sentencing of a particular 
prisoner by a particular judge is a matter of chance, it is obvious 
that by chance each judge will get an equal number of cases whose 
sentences should be long or short. In other words, given a suf- 
ficient number of cases, one would expect that two judges would 
give sentences whose average severity would be about equal (pro- 
vided that the judges were influenced only by the circumstances 
of the law, the crime, and the prisoner). In the same way if one 
finds that the average length of the sentence given by one judge 
is longer than that of another, one does infer that the factors which 


*“A Study of Individual Diffrences in the Sentencing Tendencies of Judges.” 
These crimes were chosen because of their relatively great frequency. The crimes 
which were used were: larceny, larceny and robbery; breaking, entering, larceny 
and receiving; breaking and entering; robbery; embezzlement; burglary; assault; 
battery and robbery; larceny from the person; assault and battery with intent to 
rob; violations of the Hobart Act (New Jersey’s prohibition law); adultery; rape; 
assault, battery and rape; assault and battery with intent to rape; abuse; carnal 
abuse; and finally assault and battery with intent to abuse. 
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determine this difference lie in the judge (provided that the 
number of cases is sufficiently large). All scientific work is based 
upon certain assumptions; we prefer to begin our study with a 
statement of the assumption upon which it is based. The value 
of -the conclusions drawn from the data is dependent primarily 
upon the validity of the assumption. 

The authors are particularly desirous of pointing out that they 
are not considering any condemnable factors involving the judge 
and do not intend to explain the differences in the sentencing 
tendencies of these judges on such grounds. The study assumes 
that judges, being human, will show the same individual differ- 
ences as other humans in other fields of endeavor. Of course, every- 
one would expect differences in the sentencing tendencies of honest 
and dishonest judges, but this study is not centered upon such 
differences. 

The Results. This second study was conducted by tabulating 
all of the jail and penitentiary sentences given by each of the 
six judges. The results of this study are shown in Table I.4 
Some of the results from the previous study are included in this 
table in order to compare the two measures (frequency of jail 
sentences and severity of jail sentences) of the severity of the 
sentencing tendencies of these judges. It will be noticed that 
Judge 2, who is the most lenient in his sentencing tendency when 
measured by the percentage of jail sentences given by him, is also 
one of the most lenient when the length of penitentiary and jail 
sentences is used as the unit of measurement. In the table the three 
rows labelled upper quartile, median and lower quartile refer to 
the points dividing the entire number of cases into first, second, 


“It should be noted that these medians and quartile positions are only ap- 
proximations, because the distributions from which they are derived are arranged 
in such a way that sentences of a certain number of months and those with the 
same number of months with the addition of a fine are grouped together. For 
instance, beginning at the top of the distributions we have sentences of 24 months 
and under it sentences of 18 months and $200 fine. It is assumed that the latter 
is less severe than the former. In some cases where similar situations occur it may 
be doubtful whether this is a correct arrangement of these sentences in order of 
severity. However, it is uncertain whether these factors made an appreciable 
difference in the results which are being used, namely averages. 
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TABLE I 
Average Length of Jail or Penitentiary Sentences Given by All Judges 








Judge l 2 3 4 





Upper Quartile 18 mos 12 mos 18 mos_ 15 mos 

Median Sentence 12 mos 6 mos 10 mos 9 mos 

Lower Quartile 6 mos 3 mos 6 mos 6 mos 

No. of jail and 170 230 455 330 104 
penitentiary sen- 

tences 

Percentage of 35.6% 33.6% 53.3% 57.7% 45.0% 
imprisonments ® 

Total no. of 1235 1693 1869 1489 480 


cases. Sentences 


of all kinds 





TABLE II 


Showing Frequency with Which Judges Use Sentences of Three or 
Multiples of Three 








l 2 3 + 5 6 
No. of sentences 
of three months & over 170 184 430 322 102 170 


Percent in multiples 


of three months 99.4 96.7 78.4 94.7 99.0 93.2 





third and fourth quarters as to severity of sentences.5 (See Table 
I.) This means that half of Judge 2’s sentences are between 3 and 
12 months in length. On the other hand, Judge 1, who also gave 
a small percentage of sentences in the form of imprisonments, 
gave sentences of greater length than any of the others. It should 
be noticed that Judges 1, 2, and 3 were sitting at the same time. 
When we consider the difference in the percentage of sentences 


*Only jail and penitentiary sentences were used because of the difficulty of 
arranging in order of severity many of the State imprisonment sentences. For 
instance, many of the state prison sentences were for 2 years, 3 years, 2 to 4 years, 
2 to 5 years, etc. The authors were unable to find any way in which such sentences 
might be arranged in order of severity. 


*Data taken from previous article; imprisonments include county and city jail, 
penitentiary and state prison sentences, 
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which are imprisonments and the average length of sentences 
given by Judges 2 and 3 we can see that our justice is certainly 
affected by the human equation. 

Another problem of considerable interest which arose in the 
examination of these results was the tendency for judges to give 
sentences of three or multiples of three months in length. As to 
why sentences of such length have such peculiar curative, punitive, 
or reformative values, the authors are unable to say. If the practices 
of the judges may be used as measures of what desirable sentencing 
tendencies are, it is certainly evident that sentences of three, six, 
and nine months are more efficacious in reforming and deterring 
criminals than sentences of four, seven, and eleven months which 
are found only on rare occasions. It is also interesting to note that 
some judges used this “multiple of three” formula in sentencing 
less frequently than others. Table II shows these differences very 
clearly. It will be noted that Judge 3 gives far fewer sentences of 
the three or multiple of three months variety than does any of 
the others. Whether this implies that Judge 3 considers his sentences 
more carefully than the other judges, the authors are unable to say. 





OBJECTIVES AND AIMS IN THE 
INTRODUCTORY COURSE IN 
PSYCHOLOGY 


HARVEY C. LEHMAN and PAUL A. WITTY 


Ohio University and Northwestern University 


HIS paper presents a brief analysis of some of the difficul- 

ties encountered in formulating objectives for an introduc- 

tory course in psychology. The catch-word “objective” 
became popular with educators during and immediately following 
the World War. The World War added to or expanded several 
terms in our educational jargon. For example, we now have 
“batteries” of tests, “line and staff” in educational administration, 
“objectives.” 

Just before the outbreak of the war a commission was appointed 
by the National Education Association to study secondary educa- 
tion. Several committee chairmen published preliminary state- 
ments regarding the aims and the purposes of various high-school 
subjects (1), but the final report of the commission—the widely 
publicized Cardinal Principles of Secondary Education—appeared 
in 1918. (2) The preliminary report of 1913 made no reference to 
“objectives”; the committee chairmen, however, spoke frequently 
of “aims” and “purposes.” In the final report, “objectives” were 
set forth on almost every page. These reports suggest that the 
term “objectives” was appropriated by the educator during the 
recent war.! 

Terminological addition or expansion is frequently associated 
with enhanced clarity of expression or thought. But does this 


*Similar corroboratory evidence is in the books and bulletins on curriculum 
construction that have been published since 1914. The objectives (usually vague 
resolutions for reform or continued “good work") are ubiquitous. For example, 
Smith reports “the unanimous statement of all courses of study that their offerings 
include ‘that amount of grammar which is functional’ in speech and writing.” 
But, as is pointed out by Smith, nobody knows what amount (or kind) of grammar 
is functional. (3) p. 35f. 
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particular appropriation of the term “objective” represent a genu- 
ine advance in educational thinking? It is doubtful that a clearer 
understanding of any problem can be attained by appropriating 
terms that have in another setting objective reality, but which in 
a changed setting suggest action that no longer possesses objective 
reality. This procedure is clearly equivocal; it serves largely as 
an escape mechanism that is not only useless, but it is also harmful 
since it sometimes enables the educator to rationalize and oc- 
casionally to dismiss vital and difficult problems.” 

When employed scientifically, words are conceptual symbols 
which possess true validity only when they represent clearly de- 
fined concepts. When basic concepts are ambiguously or illy- 
formed, objective or neologistic statement does not clarify matters. 
Nevertheless, word appropriation and expansion appear to have 
replaced careful scrutiny of basic educational problems by many 
who have stated their objectives. Some objectives reflect wishful 
thinking; some are vague; others are tautological or equivocal. 
Frequently, obscurity is cloaked by the use of analogical terms 
appropriate in one setting but obscure or even invalid in a changed 
setting.* Obscurity is evident in the loose use of the word “ob- 
jective”; and it is apparent also in the more detailed statement of 


“goals,” “aims,” or “purposes.” 
As used in the army the “objective” is the point aimed at; 


* Of interest in this connection is a recent fact-finding survey of secondary edu- 
cation which has provided a new inventory. These surveys are sometimes alleged 
to be helpful in curriculum construction, for ostensibly they provide “objectives” 
or outcomes toward which one may strive. A recent book collated by members of 
the North Central Association of Colleges and Secondary Schools suggests that 
some continue to feel that a statement of outcomes (objectives or aims?) is a 
meaningful as well as a fruitful basis for curriculum construction. (4). 


*This practice is not engaged in solely by professional educators. Wheeler and 
Perkins have appropriated a concept from physics and have labelled their psy- 
chological equivalent “The Law of Least Action.” This practice does not reflect 
equivalence, but rather equivocation, since the human energy unit is undefined 
and the units of measurement are unspecified. (5). 

Similarly, Thorndike in his recent volumes (6) (7) on human learning used 
“belonging” as a new concept. That he has confused his contemporaries in psy- 
chology is clearly seen by reference to several reviews of his work. Cf., in this 
connection, McGeoch, J. A.: Journal of General Psychology, 1933, 8, 285-296. 
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“aiming” on the other hand is the operation of giving the gun 
the direction and the elevation necessary to hit the objective.‘ 
If “aiming” posits giving the gun the desired direction, and the 
word “objective” signifies that which is aimed at, it is clear that, 
if we are to make something more than a shot in the dark or 
often to engage in random firing in the air, the objective should 
be decided upon prior to taking aim. Similarly, it seems reason- 
able that instructional objectives should be decided upon prior 
to attempting class work if the word objective is to maintain a 
semblance of its original connotation.® 

After the World War, many educators essayed to formulate 
their instructional objectives; indeed the sponsors of almost every 
school subject apparently felt that precisely stated objectives were 
demanded. Even the teachers of Latin became amenable, and 
in the general report of the Classical Investigation, they set up 
values (rechristened objectives) that were alleged to be derivatives 
or concomitants of the study of Latin (8). In doing this for Latin 
and for other school subjects the meaning of the word “objective” 
was altered and the underlying concept was modified and en- 
hanced. It was then necessary to employ numerous qualifying 
adjectives such as “immediate,” “ultimate,” “disciplinary,” and 
“instrumental.” ® 

“Immediate” objectives were defined by the advisory committee 
of the American Classical League as “those indispensable aims 
in which progressive achievement is. necessary to ensure the at- 
tainment of the ultimate objectives, but which may cease to func- 
tion after the school study of Latin has ceased; for example, the 
ability to conjugate a Latin verb or to translate a passage from 
Czsar” (8) p. 32. And as illustrations of “ultimate” objectives, 
the committee presented the following vagaries:—“Development 


“By extension the word “objective” came to mean in the military sense:— 
(2) the point towards which the advance of troops is directed, and (3) that which 
has been marked for capture. But among educators the term has been further 
inflated until today it lacks precise meaning. 


® Precise statement need not preclude elasticity or flexibility in teaching. 


*Monroe and Herriott even tried to distinguish between “actual” objectives 
and “paper” objectives. (9) p. 5. 
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of correct mental habits,” “Development of an historical and cul- 
tural background,” and “... development of right attitudes 
toward social situations” (8) p. 79. Such phrases may sound 
well, especially to the educationally unsophisticated, but they 
obfuscate neatly the whole problem of discerning what outcomes 
should be expected from the study of Latin. 


THE INTRODUCTORY COURSE IN PSYCHOLOGY 


Let us grant for the moment that the term “objective” implies 
no greater definiteness than realistic thinking is able to carry out; 
that this term is now synonymous with “aims” and “purposes.” 
What then are the objectives that are to be sought in the introduc- 
tory course in psychology? Any consideration of this topic by 
psychologists is likely to yield an abundance of self-justification 
and rationalization. Possibly it may aid us to maintain a clearer 
perspective if we examine our attitude toward certain of our col- 
leagues. For example, when the football: coach states that one 
of his main objectives is to build character, the psychologist is 
sceptical, incredulous, and a bit amused. The psychologist knows 
that, if the football coach is to retain his position for an appreciable 
length of time, he simply must win games (sometimes sacrificing 
certain phases of “character” to a degree such that teams are barred 
from scholastic competition). 

As classroom examples of unverifiable claims we contemplate 
(usually with great pleasure, sometimes with rancor) the obscure 
allegations of the Latin and the mathematics departments. The 
proponents have frequently claimed values for these subjects 
which have not been demonstrated objectively. Certain psy- 
chologists pronounced sentence: The claims are unverifiable, there- 
fore the values are non-existent. 

Let us now turn to the objectives of the introductory course 
in psychology. In so far as we are aware, no individual has been 
authorized to speak for psychologists as a group. Nevertheless, 
data regarding the objectives of the introductory course in psy- 
chology have been assembled recently by a joint committee of 
the Midwestern Psychological Association and the Southern 
Society of Philosophy and Psychology. These data were obtained 
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by addressing questionnaires to many teachers of the introductory 
course (10). The seven major “aims” of the introductory course 
follow :— 


1. To train students in the scientific method. 

. To satisfy the student’s desire to acquire knowledge about 
psychology in general. 

. To interest students in psychological problems and methods. 

. To teach the facts and principles of psychology so that the 
student may apply them. 

. To train students to use the psychological knowledge for 
personal adjustment. 

. To prepare the student for a better interpretation of problems 
of science. 

. To teach the student the amount and the importance of 
individual differences. 


It will be noted that the foregoing list was published as “aims” 
rather than as “objectives.” However, since many present-day 
writers use the words “aims” and “objectives” interchangeably,’ 
we shall do likewise in the discussion following. 

It will be seen at once that all seven are somewhat general state- 


ments. And the thirty-one subsidiary aims (that are included 
in the original report) have about the same degree of specificity 
as do these seven. Indeed, several are so vague that they appear 
to be of little help either to the classroom teacher or to the test 
technician. Surely it would be difficult to develop objective devices 
for measuring and expressing the extent to which students had 
attained these objectives. 

Consider, for example, the first of the seven major objectives :— 
To train students in the scientific method. A worthy objective 
indeed! But precisely what is the scientific method? 

“The scientific method has been variously defined. Perhaps 
the simplest statement would be that it consists in finding out, 
by observing, reference to the literature or actual experiment, all 


"For example, see references No. 8 and 9. See also the change in the dictionary 
definition that has occurred during the past two decades. (11) (12) (13). This 
dénouement should teach us that it is impossible to ameliorate an educational 
dilemma by the introduction of neologistic or paralogistic statement. 
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ascertainable facts pertaining to a given situation. An indispen- 
sable adjunct is the ability to properly interpret the facts after 
they have been found” (14) p. 555. 

If the foregoing statement really expresses what is meant by 
“the scientific method,” does not the first aim of the introductory 
course imply a generous amount of transfer of training? Surely 
the present status of the controversy regarding “transfer of train- 
ing” would lead one to question the validity of this implication. 
Surprising is it that psychologists would rationalize thus. How- 
ever, many scientists forsake the scientific method when they deal 
with subject matter that lies outside their particular fields, and 
when they set forth implications from scientifically valid data.® 
The setting of objectives by any group of scientists seems to be a 
philosophical effort which requires abandoning the scientific 
method. 

It seems that the scientist’s “scientific method” is limited almost 
wholly to his specialized interest and endeavor. In religious, in 
political, and in international affairs, he frequently is unscientific 
or partial in his observations. He occasionally exhibits strong 
prejudices and accepts prevailing and sanctioned absurdities. He 
appears also to abandon the sacred principles of “objectivity” and 
“verifiability” when he turns philosopher and attempts to state 
his objectives. 

Probably most psychologists now believe that the introductory 
course does have values that cannot be substantiated by our rather 
crude measuring instruments. However, if attainment of these 
goals is a generally accepted aim and if, concurrently, the psy- 


*With regard to this, the chairman of the Chicago Section of the American 
Chemical Society has written recently: — 

“No better example of this can be cited than the deplorable record of scientific 
men in all warring countries during the last war. Nothing was done outside 
their specialties which could have been distinguished from the actions of the most 
unlearned or unscientific, unless perhaps it was the greater vindictiveness of the 
scientific men. What more shameful spectacle was witnessed behind the battle- 
lines than the irrational hatreds exhibited by both individual and organized 
scientific men in all countries. There is no need to be specific. The shame is on 
all alike. Instead of being leaders themselves, which their training should have 
made them, they permitted themselves to be led, or even driven, like sheep, by 
intellectually inferior politicians and industrialists in every country” (14) p. 555. 
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chologist admits his inability to substantiate the claim, he should 
admit frankly that he is in precisely the same predicament as 
are the Latin and the mathematics teachers who have frequently 
attached chimerical values to study of their subjects. 

It is a rather curious fact that historically the ancient languages 
and mathematics have borne the brunt of the attack that has been 
made by those who doubt that mastery of school work implies 
cultural growth. Nevertheless, many teachers of science have 
been rationalizing their work, stating that they were teaching the 
beginning student (among other things) “the scientific method.” 
If psychology now is going to take its stand with these optimists, 
good taste seems to require that psychologists henceforth discard 
the rather hypercritical attitude toward others. Autocriticism and 
objective verification seem needed by those who assert that their 
students develop a “scientific attitude” or even “an understanding 
of the motives of men” through the study in a beginning course in 


psychology. 


OBSTACLES TO THE FORMULATION OF INSTRUCTIONAL 
OBJECTIVES IN THE GENERAL COURSE 


There are several reasons why the psychology teacher finds it 
difficult to formulate his objectives. These are so inextricably 
complex and interwoven that it is difficult to disentangle them. 
In the maze one may identify the following:—(1) The difficulty of 
obtaining agreement as to which acquisitions are most desirable. 
(2) The lack of agreement as to whether we shall concentrate 
to develop rather fully a few most able students or try to raise 
the general level of knowledge and understanding. (3) Institu- 
tional inequalities in laboratory equipment, in library facilities, 
in the preparation and ability of the teaching staff, and the like. 
(4) The status of psychological concepts underlying frequently 
stated objectives. (5) Our limited knowledge of the psychology 
of learning. (6) The disparity among the objectives that have 
been published, and (7) The lack of any obvious or necessary 
relationship between “immediate” and “remote” objectives. The 
foregoing obstacles will be discussed briefly in order. 

(1) The difficulty of obtaining agreement as to which acquisi- 
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tions are most desirable. In teaching any school subject, selection 
of subject matter is inevitable since the instructor cannot present 
all of the known facts and principles. In psychology the problem 
of selecting subject matter judiciously is especially precarious 
both because of the newness of the science and because of the 
exceptionally large number of controversial issues and contradic- 
tory points of view. 

At the annual business meeting of the American Psychological 
Association held on Dec. 27, 1928, in New York City, a motion 
was made to have the association instruct the incoming president 
to appoint a committee of fifteen members to try to reach an agree- 
ment with respect to “the ideas which should be fulfilled by a first 
course in psychology” (15) p. 127. The motion was lost. The 
negative vote may have concealed diverse and incompatible posi- 
tions as well as the variety of aims espoused by the members.® 
Of course, a few leaders preach their dogmas with militant cer- 
tainty, but most psychologists realize the insecurity and the ten- 
tative quality of their hypotheses. 

(2) The lack of agreement as to whether we shall concentrate 
to develop rather fully a few most able students or try to raise the 
general level of knowledge and understanding. Some university 
authorities attempt to select their students carefully and display 
considerable solicitude over the welfare of those who are con- 
spicuously able. However, most American universities devote 
their efforts to the education of large numbers of rather chaotically 
selected students. Obviously, the instructional objectives of the 
introductory course will depend in part upon the general objec- 
tives of the institution as a whole. Because of the disparity in 
practice there probably can be no nation-wide policy that will 
prove satisfactory. Individualization of instruction and homogene- 
ous grouping (occasionally advocated by psychologists for use by 
others) are seldom practiced in psychology. It seems that many 
psychologists feel that study of identical material will prove fruit- 
ful for all of their students. 

(3) A third barrier to any satisfactory nation-wide policy is 


*Jensen’s recent study reveals much difference of opinion as to the relative 
significance or the practical value of psychological terms (16). 
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found in institutional inequalities in laboratory equipment, in 
library facilities, in the preparation and ability of the teaching 
staff, and the like. Peterson and Dunkle report regarding the 
teaching of psychology in teacher-training institutions in the 
South: 

“We are forced to the conclusion that the normal schools and 
teachers colleges have no adequate conception of psychology as a 
growing science and of the needs of institutions and the qualifi- 
cations of the instructors for proper training of students in this 
line. . . . Most of them, even those of collegiate rank, have no 
laboratory equipment and no funds for any, no journals, few real 
psychology books, and no plans to improve in these regards. The 
majority of them do not even have teachers for psychology who 
have been in psychology even as far as to the requirements of 
the master’s degree. Indeed, the majority have not even done 
their major work in psychology, and evidently have no profes- 
sional standards as to teaching in lines outside the range of their 
preparation. They teach because they are offered a position in 
this line, and the offer comes from persons who, though entrusted 
by the state with great educational responsibility, do not them- 
selves know that these instructors cannot possibly train students 
in psychology and educational psychology” (17). 

In view of the foregoing situation it seems obvious that certain 
of the attempts to postulate very detailed objectives for the in- 
troductory course in psychology are a work of supererogation. 

(4) The status of psychological concepts underlying frequently 
stated objectives. The psychologist’s objectives are inevitably 
equivocal since they involve the use of conceptual terms that con- 
vey varied and occasionally diametrically opposed ideas. For 
example, in that section of every introductory course which deals 
with personality, certain character traits are likely to be dis- 
cussed—such traits as aggressiveness, persistence, suggestibility, 
and the like. Today the very existence of these generalized traits 
has been brought into question by the advocates of extreme spec- 
ificity. These proponents do not lack statistical data to support 
their claim. Again and again, a battery of tests devised to measure 
so-called personality traits yields results that are so unreliable and 
so undependable (when compared with other criteria) that one 
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is led to question the existence of the generalized traits. In 
numerous instances the instruments for measurement are very in- 
adequately conceived, and the use of terms is often clearly unsatis- 
factory. A survey of the nature of the frequently used tests 
suggests that there is a fundamental limitation in most character 
tests. Trait testers appear to assume that whatever they may 
name has objective reality; some testers need not so much to 
improve existing measuring devices but to improve or change 
their thought regarding “traits.” 

Certain psychological concepts are of course much better de- 
scribed and established than are others. Indeed, a few of them 
are so well demonstrated that most psychologists would agree 
upon definitions thereof. Nevertheless, if one tried to make a 
list of all that should be taught to the beginning student, it is 
probable that many concepts would be included that are variously 
and conflictingly defined. If such a list tended to clarify present- 
day misconceptions the future progress of both students and 
teachers would be improved. However, in the history of psy- 
chology many concepts have proved to be merely transitory formu- 
lations. Therefore, perhaps the tentative nature of hypotheses 
should be taught, but uniform teaching objectives, if precisely 
stated, could be employed only by the dogmatic disciples of par- 
ticular “schools.” 

(5) Our limited knowledge of the psychology of learning. Cue 
reduction and simple association theories have been advanced 
as explanations of learning; one writer has proposed recently 
than his three laws of learning be modified and that a place be 
made for a curiously but objectively postulated doctrine of “be- 
longing.” Another has written or assembled four different books 
in each of which he has unqualifiedly set forth the universal 
organismic laws; and still another has modestly advanced three 
hypotheses concerning human learning. A bibliography recently 
appearing in the Psychological Bulletin (18) contained 1200 ref- 
erences on learning; many of these references raise controversial 
issues, and demonstrate clearly psychologists’ uncertainty regard- 
ing the real nature of learning. 

The laws or facts of human learning are by no means proven, 
but choose we must from the available data and contradictory 
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studies and views. The choice, although imperative, is not always 
fortunate, and crusaders easily find evidence alleged to support 
certain postulates. In an effort at thoroughness or completeness, 
as well as in an attempt to meet the demands of certain objec- 
tives, the psychologist is led frequently to include almost inter- 
minable specifics in his list. Of course, he has in this practice 
innumerable confreres. At the present time lists of objectives 
in certain other fields are so detailed that they are ridiculous, as 
well as meaningless.’° 

In contrast with the foregoing, some lists of objectives are so 
general as to imply miraculous transfer. This is clear in certain 
of the psychologists’ seven aims. Perhaps we should admit 
frankly that at the present time we simply do not know enough 
about the nature and subject-matter of human learning to formu- 
late lists of instructional objectives that will be reasonably de- 
fensible, intelligible, and generally useful (19) p. 21. 

(6) A sixth obstacle to the formulation of valid objectives is 
the disparity among the objectives that have been published. 
For example, in the list of thirty-one subsidiary aims of the psy- 
chologists we find these: 

(i) To interest students in psychology as a pure science with- 
out regard to applications. 

(ii) To train students to use psychological knowledge in gov- 
erning the conduct of others. 

If psychologists are willing to accept both of these objectives 
they wish apparently to train students in the application of psycho- 
logical principles and paradoxically to extol knowledge without 
application! Certain of the contradictions in the subsidiary 
objectives may be traced perhaps to the disparate views held by 
those who follow different “schools” of psychology. Should the 
beginning student be made aware of the existence of the different 
schools of psychology or should he be given a point of view that 


“For example, in one list of instructional objectives formulated for pupils of 
junior high school age we find that: 

”, the ability to raise a family of mice is one specific objective, another 
is the ability to raise turtles and by implication it would seem necessary to include 
the ability to raise every kind of animal which man might conceivably wish to 
raise” (19) p. 19. 
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is consistent but incomplete and unrepresentative? Some text- 
book writers state frankly that they prefer to avoid confusing the 
immature student; therefore they set forth only a relatively few 
facts, principles and laws of experimental psychology. Internal 
consistency frequently is attained by eliminating material which 
does not support a series of centrally postulated doctrines and 
hypotheses. Other writers think that such censorship should 
not be tolerated and that it is unfair to conceal differences of 
opinion and of fact. 

The disparate points of view make the formulation of objec- 
tives especially precarious; therefore, it is not surprising that the 
objectives resulting from collating opinions are contradictory. 
We are left almost wholly in the dark as to what extent the seven 
major “aims” (as weli as the thirty-one subsidiary ones) are 
consonant with one another and to what degree they are related. 
Regardless of what we may ascertain in the future, it should be 
evident that it is futile to state objectives that appear to be mutually 
exclusive without justifying or at least explaining the statements. 

(7) The lack of any obvious or necessary relationship between 
“immediate” and “remote” objectives provides another barrier. 
Educational writers sometimes speak of immediate objectives and 
contrast these with remote objectives. Occasionally, a writer 
points out that the methods employed for achieving the immediate 
objectives may make it difficult or impossible to attain the remote 
(but perhaps more desirable) objectives. Consider the objec- 
tive:—“Training students in the scientific method.” We attempt 
to do this by getting the student to perform certain laboratory 
experiments. The laboratory method (when employed by quali- 
fied, experienced scientists) may be the best method for contribut- 
ing to the sum total of human knowledge, but it does not follow 
that this procedure (as employed in the usual elementary course) 
is the best means of developing and fostering the scientific spirit. 
Donald Laird reports that in one of his classes he discovered 
that fully half of the experiments had been written out neatly 
with data that had been “faked” (20). 

Laird is not the only one who doubts that there is close rela- 
tionship between our “immediate” and our “remote” objectives. 
Knight Dunlap doubts the usefulness of trying to get the be- 
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ginning student to rediscover the psychological principles that 
have been unearthed by our professional forbears (21). Indeed, 
this writer believes that most of the laboratory work that is done 
by undergraduates in psychology is a sheer waste of time. 

Whether we concur with these views, all would probably agree 
that no one is able to say which of our immediate objectives are 
most likely to instill into students the scientific attitude. And 
surely, if the value of the laboratory work is called into question, 
one may doubt also the efficiency of the lecture or classroom work 
in developing the scientific attitude. 

In light of the foregoing considerations, what are the objectives 
of the introductory course in psychology? In the judgment of 
the writers there simply are few general objectives that are highly 
useful, generally valid, or acceptable. If this statement be ac- 
cepted it follows that very precise measurement of instructional 
efficiency is for the present quite impossible in so far as the intro- 
ductory course in psychology is concerned. It follows also that 
decision regarding the detailed content of the introductory course 
(the curriculum) must also be postponed.?* 

A psychology laboratory in a certain western state was recently 
dedicated to the following ends: 

“Insight into the Nature of Mental Life, Appreciation of its 

Beauty, and Wisdom in its Control 

“Development of Personality, Scientific Integrity, and the Art 

of Deliberate and Adequate Statement of Fact.” 

With the foregoing objectives the writers have no quarrel. 
Like most other so-called “objectives,” these resolutions have one 
very great weakness: everyone is willing to accept them because 
they mean little until they are stated in far more definite terms. 


™Cf., in this connection Haggerty, M. E.: “Remaking the Psychology Cur- 
riculum,” Journal of Higher Education, 1930, 1, 78-84. Also Pechstein, L. A., 
and Broxson, J. A.: “The Determination of a Course in Psychology for the High 
School,” School Review, 1933, 41, 356-361. 


In view of the regrettable situation that Peterson and Dunkle have reported 
regarding the teaching of psychology in teacher-training institutions in the South, 
it appears that our inability to measure instructional efficiency precisely, and our 
inability to decide upon the details of a curriculum are matters of relatively minor 
importance. 
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As they now stand, they are a euphonious and striking play 
with words—suitable for use at dedicatory ceremonies, at testi- 
monial dinners, at commencement exercises, for the framing of 
epitaphs, and for other eulogistic purposes. 
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EIGHTH INTERNATIONAL PSYCHOTECHNIC CONGRESS 
(Continued from page 656) 

found that time and space perceptions play a large part in ac- 

cidents. Mayerhofer (Prague) argued on the ground of his 

experiences against the frequently quoted theory of accident 

susceptibility adhering to an individual as by destiny. 

In the vocational section, Wallon (Paris) spoke on the in- 
vestigation of the character traits of children and adults. None 
of the former methods of investigation can lay sole claim to 
special means of grace. The disclosure of the weaknesses of 
methodology is, however, valuable if true. It was gratifying to 
hear an able investigator acknowledge the unsatisfactory position 
of characterology while the dabbler maintains repeatedly that 
he is able to determine the presence or absence of character traits. 

Baumgarten (Bern) reported a new, easy, practicable method 
for the investigation of interests of children and adults. Rupp 
(Berlin) reported on his trial experiment to learn the facts 
about one’s self on the basis of questionnaires. Schurer-Waldheim 
(Vienna) showed how necessary vocational psychology is to 
the understanding of criminally inclined youth. Those with the 
defects of intelligence amounted to approximately five per cent. 
Criminality can often be traced back to a blunder in the choice 
of vocation. 
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Viteles (Philadelphia) reported original and successful studies 
of the usefulness of motor ability tests; Dr. R. Biegel (The 
Hague) on new instruction methods for telegraphers; Dr. 
Kuchynka on the selection of the officers in the statistical office 
in Prague; Mrs. Raphale (London) on the selection of personnel 
for higher administrative positions. 

The Russian representative ‘stated that the psychotechnical 
methods used outside Russia—the psychotechnic of the middle 
class—have been with problems of talents or endowments and 
intelligence and this without justification. One should take 
into consideration only the influence of social factors on the 
practice of one’s profession. The endowments are really only 
results of the environment. Vigorous polemics led to little 
agreement. 

The graphologists attempted to demonstrate the significance 
of graphology for the investigation of character traits as at least 
a part of psychotechnical examinations in response to which the 
prominent psychotechnologists maintained the greatest reserve. 

The report of Pechold and Cibulla (Witkowice) was presented. 
It told of the creation some ten years ago of a central office for 
the prevention of accidents in the iron works in Witkowice 
(Czechoslovakia). The earnest work of the various commissions 
(unification of terminology and of tests, scientific guidance or 
work, pathology of the work, etc.) was summarized by their 
chairmen (Pieron, Wallon, Lahy, Eliasberg, Vana, Baumgarten). 

A distressing incident happened just before the close of the 
Congress to which the local press gave much attention. One of 
the German participators indulged in eulogies of the high char- 
acter of the German people and was understood to disparage 
the character of the Americans. At one point his listeners voiced 
their displeasure and the American delegates left the hall with 
the assertion, “This is not science, but instead, rvlitics.” The 
general secretary closed the meeting with the remark that 
political and religious questions were not to be topics of dis- 
cussion before the Congress. The next Congress is to take place 
in America in three years. 

Dr. Franziska BAUMGARTEN- I RAMER, 

Privatadozentin a.d. Universitat Bern, Solothurn-Rosegg, Schweiz 


Translated from the German by 
N. Wilford Skinner, Ohio University 





THE RELATIVE LEGIBILITY OF 
LINOTYPED AND TYPEWRITTEN 
MATERIAL 


EDWARD B. GREENE 
University of Michigan 


_ HE recent perfection of quick and inexpensive photo- 


printing processes has made the use of typewritten material 

in the printing of periodicals and books much more com- 
mon than a few years ago. Hence the question of the legibility 
of typewritten and linotyped samples has become of more impor- 
tance. In this study the Ionic linotype was chosen as an example 
of a very clear and'commonly used linotype style and compared 
with similar sizes of American typewriter type. 

The main differences between these two styles, when the size 
of body is the same, are thickness of line, spacing, and indentation 
of right-hand margins. (See Figure 1.) The thickness of line is 
always the same for all typewritten material, but it varies in the 
Ionic linotype where the horizontal and oblique lines are much 
thinner than the vertical lines. The spacing for each letter is 
the same for the typewritten material, but varies in the linotype 
material according to the width of the letter; thus a w is given 
3¥, times as much width as an/ or ani. The right-hand margins 
are even in the linotype samples, but uneven in the typewritten, 
due to the differences in operation of these machines. A more 
detailed description of the differences between these two fonts 
when set in 10 pt. body is given below: 


1. The type face: 


1.1 Thickness of line. In the linotype sample the lines vary 
in thickness. The thin oblique lines of k, u, v, w, x, y, 
and z, and the horizontal portions of the other lower case 
letters are about .13 mm. thick. The vertical stems of all 
the lower case letters are approximately .34 mm. wide. 
The capitals vary also. The upright stems of .50 mm. 
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are tapered down to .20 mm. for the oblique and rounded 
portions of the letters. 

In the typewriter type, the thickness of line is the same 
for all parts of both capitals and lower case letters—ap- 
proximately 35 mm. 

Projectors: The ascenders, 5, d, f, h, k, 1, and t, project 
above the top of the lower case o approximately .60 mm. 
in the linotype samples, and 50 mm. in the typewriter 
samples. The descenders, g, 7, p, g, and y, project below 
the bottom of the lower case o approximately .60 mm. in 
the linotype and .50 mm. in the typewriter. 

Serifs. The linotype serifs are typically short, thick and 
in the forms of slightly rounded rectangles. Their hor- 
izontal length from the stems of their letters is uniform, 
approximately .20 mm. The typewriter serifs are thick, 
slightly rounded rectangles which vary in length. The 
serifs on i and / are 50 mm. long, on m and w only .10 
mm., and on most of the other lower case letters 33 mm., 
or approximately the same as the width of the stems. 
Size. The height of both the linotype and typewritten 
capitals is 2.10 mm., but they differ in lower case heights; 
the linotype o is 1.50 mm. high and the typewriter 1.60 
mm. The two styles have the same width of lower case 
0,150 mm. The widths of the other letters differ accord- 
ing to their shapes. In the linotype the w is 2.80 mm. 
wide and the / only .80 mm. wide measuring from the 
ends of the serifs. In the typewriter the w is approxi- 
mately 2.00 mm. wide and the / is 1.40 mm. wide. 

Blur. There is no apparent difference in clearness of 
line between the linotype and the typewriter samples with 
the naked eye, but under a microscope the linotype has 
a little sharper transition from black to white than the 
typewritten. The blur probably comes from the depres- 
sion of the paper in the typewriter when the type presses 
the ribbon against the paper. This blur is much more no- 
ticeable on copy from a typewriter than on the same copy 
which has been reproduced photographically, due to the 
fact that the reproduction process eliminates most of 
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the fine shadings by working on an all-or-none principle. 
The widths of blur, distances from pure white to black, 
in linotype material were on the order of .03 mm. and 
in the typewriter .06 mm. The fuzz, or fringe of gray 
around the letters would probably have to be much 
greater than this in order to affect the legibility notice- 
ably. A careful study would have to be made in order 
to determine its effect. 


. Spacing. The linotype machine varies the spacing of letters 
horizontally according to the width of the letters, but the 
typewriter allows the same space for all letters, approximately 
2.00 mm. 

. Margins. Both samples were printed in double column of 
814 by 11 inch sheets (22 by 26 cm. sheets) with an inter- 
columnar space of 15 mm. 

. Indentation. The linotype paragraph indentation was approxi- 
mately 3.00 mm. and the typewriter 12.00 mm. The right- 
hand margins of the linotype material were straight, since the 
linotype machine spaced the whole line to fit the width of 
the column by inserting blanks between all the words after 
the words had been set up. The typewriter left uneven right- 
hand margins. In the samples used the indentations varied 
from zero to five letter widths or 10.00 mm., average 1.68 
letter widths or 3.26 mm. 

5. Blackness of ink. A Macbeth illuminometer with a Taylor 
reflectometer was used to determine the relative blackness 
of paragraphs where the same words were printed in two 
fonts. The reflection indices averaged .726, S.D. .021. Such 
variations in blackness were very small because the samples 
were carefully printed on the same paper. Variations of 
.08 or more in reflection indices are noticeable to the naked 
eye. Blackness may well be an important factor in legibility, 
because the contrast or irradiation phenomena will be mini- 
mum at a certain blackness. The optimum blackness can 
only be determined by a carefully controlled experiment. It 
would probably bear some relationship to the thickness and 
size of the various parts of the letters. 
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For this experiment two sizes of type, commonly known as 10 
pt. and 7 pt., were used. All samples were set solid, that is without 
extra interlinear spacing. A count of words per area showed that 
linotyped and typewritten samples were equivalent. All the 
samples were printed on a white, medium weight, bond paper 
in black ink.? 

There are three commonly used methods for comparing the 
legibility of type forms such as these. One measures the maximum 
distance at which the material can be read. Another measures the 
minimum brightness needed for reading. Both of these limit the 
exposure time to very small periods, hence neither are typical of 
ordinary reading conditions. The third method measures the 
speed and accuracy of reading under normal conditions, using 
equivalent groups of students and standard reading tests of 
equivalent difficulty. This third method was chosen and applied 
as follows. Two equivalent forms of tests were given to college 
students under classroom conditions. Two groups of students 
were given identical samples of Test I, but different samples of 
Test II. If all the factors except the type of the second test were 
the same for both groups, then the group which improved the 
most from the first to the second test would presumably have read 
the more legible font on the second test. Each test was allowed just 
ten minutes of working time, and there was a three minute inter- 
mission between tests. 

College students, 685 in all, were used in this study because of 
their availability, their great speed and accuracy in reading, and 
their ability to cooperate in eliminating distractions of various 
sorts. The groups of students compared were nearly equal in 
age, training, and motivation. From the results which follow, 
Tables I and II, we are justified in assuming that the groups 
compared were nearly equal in reading ability as shown by scores 
on the first test. It is, of course, possible that college students read 
so well that they overcome differences in legibility of certain fonts 
by additional effort which is not at present measurable. Slower 
and more inaccurate readers might therefore show larger differ- 


This material was furnished by the courtesy of Edwards Bros., Inc., Ann Arbor, 
Mich. 
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ences between samples than these superior readers. This can only 
be ascertained from a further study. 


TABLE I 
Linotyped vs. Typewritten Material: 10 pt., 61 mm. Line. 








Improvement 
ILI S.D. 


S.D. i fe 
Font Test Mean No. Percent ment Ratio 





Tp.10 I 190. 4.1 5.9 883 4.65 
Tp.10 1.65 


Tp.10 I eC; a8 2 S33 
» wh 68 1.56 


> Deen * : oa we 39 85 472 
. eee 1.55 


Loh: 4 ae 5.6 68 901 6.22 
Tp.10 UI 1.58 





TABLE II 
Linotyped vs. Typewritten Material: 7 pt. 51 mm. Line. 








nn S.D. 


S.D. — ++ ed mprove- 
Group N Font Test Means Mean No. Percent ment 





84 * ae wees AS gs 2.8 3.8 
Mm 75.4 1.62 

68 TORN: Bee gay 41 5.8 
HW 74.3 = 1.72 

59 ; .ee 608. | 6.1 99 
I 732 1.66 

74 1 654 160 . 5.6 8.4 
Tp. 1 71.0 1.48 





To provide tests for this experiment two equivalent forms, 
known as the Michigan Speed of Reading Tests, were devised fol- 
lowing the plan of the Chapman-Cook tests. Each test consists of 
100 numbered and disconnected sections of 30 words each. In each 
section one word in the latter half spoils the meaning for the first 
part. (See Figure 1.) The student, after five samples had been 
finished, was asked to cross out the wrong word as he read and 
to finish as many as possible in ten minutes. The scores shown 
in the results below represent the number of sections correctly 
marked. The average accuracy of the college students who acted 
as subjects was 98 per cent correct of the items attempted. This 
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was due probably to the fact that the vocabulary and information 
used in the tests are approximately that of the 4th grade. 


Fig. 1. 
IONIC LINOTYPE 


1. We heard a noise in the pas- 
ture lot. When we went to find out 
the trouble, there was a cow mooing 


sadly. Her puppy had been taken 
away from her. 


AMERICAN TYPEWRITER 


1. We heard a noise in the 
stuze lot. When we went to 
ind out the trouble, there was a 
cow mooing sadly. Her puppy had 
been taken away from her. 


RESULTS 


The results are shown for the 10 pt. samples in Table I and 
for the 7 pt. samples in Table II. The following general con- 
clusions may be drawn: 

The results of the 10 pt. samples show that Group 1, which read 
only typewritten material, improved more than Group 2, which 
read typewritten material on Test I and linotype material on Test 
II. As shown by the percentage of gain, Group 1 improved 5.9 
per cent and Group 2, 2.8 per cent. The difference between these, 
3.1 per cent, is in favor of the typewritten material. 

Similarly, Group 3, which read only linotype material, improved 
less than Group 4, which read linotype material on Test I and 
typewritten material on Test II. The percentages of gain were 
5.9 for Group 3, and 6.8 for Group 4. The difference between 
these, 0.9 per cent, is again in favor of the typewritten material. 

The results of the 7 pt. samples show that Group 5, which read 
only typewritten material, improved 3.8 per cent, while Group 6, 
which read typewritten material on Test I and linotyped on Test 
II, improved 5.8 per cent. These give a difference of 2.0 per cent 
in favor of the linotyped material. Group 7, which read only 
linotyped material, improved 9.9 per cent, while Group 8, which 
read linotyped material on Test I and typewritten on Test II, im- 
proved 8.4 per cent. These give a difference of 1.5 per cent in 
favor of the linotyped material. 
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Thus it is apparent from these results that the linotyped material 
was read a little faster than the typewritten material in 7 pt. size, 
but a little slower in the 10 pt. size. 

In evaluating these results, we must also take into account the 
reliability of the differences as indicated by the chances that they 
will occur again in similar test situations. From the standard 
deviations of the improvements, it has been calculated that the 
differences between gains must be not lower than 4.5 per cent 
in order to be significant. None of the differences noted are as 
large as this, so one must conclude that the tendencies noted are 
so small as to be easily reversed by chance factors in the test situa- 
tion, such as differences in fatigue, practice, motivations, and dis- 
tractions of the groups compared. 

The main conclusion, therefore, is that the two fonts are nearly 
equal in legibility for these students on these particular ten minute 
tests. A conclusive experiment of this sort would have to include 
measures of an individual’s energy expenditure. Such measures 
are at present not available for this kind of work. 


COMPARISON WITH PREVIOUS INVESTIGATIONS 


The evaluation of different type faces has had a long history 
in which aesthetic values seem to have played a major réle. Care- 
ful measurements of certain phases of legibility have been made 
by Burtt (1), Rothlein (2), and Pyke (3), and Pyke has given a 
very complete review of previous work. None of these three ex- 
perimenters used the Ionic Type, which is the common news print 
in America, and only one, Rothlein, used the American Typewriter 
Face. Moreover, the experimental methods were usually limited 
to the use of single letters or words, and to short reading periods— 
often less than one second. Hence no results have been found 
which are directly comparable to this study. 

Rothlein found that the American Typewriter Face was twelfth 
among sixteen faces in visibility at a distance, but she also points 
out that visibility depends upon the boldness of the type. “The 
optimum width of line was in the neighborhood of 275 to 333 
microns—below 250 and above 450 microns the letters became 
relatively illegible.”. The width of line of her typewritten samples 
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was 200 microns, while ours was 350 microns, both for ten point 
size. Her results indicate that type faces which have the same 
size letters, width of line, blackness of ink, and nearly the same 
simplicity of design are equally visible. 

Paterson and Tinker * reported that the American Typewriter 
type was read by college students, during a two minute period, 
5.1 per cent more slowly than Scotch Roman and six other type 
faces in common use. In this case and also in that of Rothlein 
the American Typewriter type was considerably less bold than 
the other types which were read faster or at greater distances. 

Pyke concludes that “extremely large typographical differences 
must be present before it is possible to say that there is any differ- 
ence in the objective legibility of types.” Further he states that 
types will be legible if they suit the psychological makeup of the 
individual and his established reading habits. Our results there- 
fore confirm, in a small fashion, the results from other methods. 


SUMMARY 


A comparison of the speed and accuracy with which Ionic lino- 
type and American typewritten samples were read by nearly 
equivalent groups of college students was made by means of ten- 


minute reading tests on both 10 pt. and 7 pt. sizes. 

The results show that the linotyped samples were read a little 
faster than the typewritten on 7 pt. material, but the reverse was 
true for the 10 pt. material. The statistical reliability of these 
tendencies was very small, indicating that the legibility of the two 
fonts is approximately the same. The reader is cautioned against 
generalizing too liberally from these results to other classes of 
students, other materials, and for longer periods of time. 
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MORE ON SEX DIFFERENCES IN 
HANDWRITING’ 


HARRY TENWOLDE 


Union Grammar School, Imperial, California 


UMEROUS studies have been completed to determine 
N whether the sex of a writer determines the characteristics 
of penmanship.” This short study attacks the problem 
from two approaches: (1) a statistical analysis of differences in 
penmanship quality of samples of the handwriting of one hundred 
boys and one hundred girls, pupils in grades four through eight 
inclusive in an elementary school with an equal number of sub- 
jects (twenty boys and twenty girls) from each grade; and (2) 
a check on the earlier studies, using teachers’ judgments as to the 
sex of the writers of forty samples of penmanship chosen at ran- 
dom from the total number of samples available for the study. 
The procedure of the study. The data reported here were se- 


cured during a penmanship survey in the El Centro, California, 
Public Schools during November, 1930. About eight hundred 
pupils in grades three through eight inclusive in five elementary 
schools wrote from a uniform “copy.” These pupil responses 
were scored for penmanship quality, using the Thorndike Hand- 


*This study is based on a part of a Master’s thesis prepared under the direction 
of Dr. A. S. Raubenheimer of the University of Southern California. 


*M. E. Broom, B. Thompson, and M. T. Bouton, “Sex Differences in Hand- 
writing.” Journal of Applied Psychology, 13 (1929): 159-166. 

J. E. Downey, “Judgments of Sex of Handwriting.” Psychological Review, 17 
(1910): 205-216. 

S. M. Newhall, “Sex Differences in Handwriting.” Journal of Applied Psy- 
chology, 10 (1926): 151-161. 

P. T. Young, “Sex Differences in Handwriting.” Journal of Applied Psy- 
chology, 15 (1931): 486-498. 

Young cites earlier work by D. Awramoff (1903), by A. Binet (1906), by E. 
Meumann (1907), by J. S. Kinder (1926), and by D. Starch (1913), in addi- 
tion to certain of the writers cited in this footnote. 


705 











706 HARRY TENWOLDE 


writing Scale as a criterion. The attack on the statistical evalua- 
tion of sex differences involved only the computation of certain 
constants and the subsequent interpretation of the significance of 
these findings. Only two hundred samples of penmanship were 
used in this portion of the study. The only selection consisted in 
limiting the sampling to include the handwriting of twenty boys 
and twenty girls in each of five school grades, grades four to 
eight inclusive. 

For the second approach to sex as a factor in penmanship 
quality, forty papers were selected at random from among the 
several hundred submitted by the pupils in the five schools. 
These forty papers were grouped on a bulletin board and num- 
bered from one to forty. This bulletin board was placed in each 
of the five schools, and the teachers in all the grades in which 
penmanship material was collected were given a blank form, 
Figure I, together with the following instructions: 


1. You are asked to judge today the sex of the writer of each 
one of the forty numbered samples of penmanship on the tempo- 
rary bulletin board found in the principal’s office. 

2. Use the rating sheet which will be given you by the prin- 
cipal. 

3. On the sex judgment, check one blank only (boy or girl) 
NOT BOTH. (Note: A check on the line after “boy” for sample 
one means that in your judgment this sample was written by 
a boy.) Be sure to judge each sample carefully. 

4. Return the blank to the principal. Be sure that you have 
signed and dated it. 


Note: These samples were chosen at random from those ob- 
tained in the recent penmanship survey in this school system. 

All of the teacher rating blanks were collected from the five 
principals’ offices, and judgments were tabulated. 

The findings. The Thorndike Handwriting Scale quality scores 
for the one hundred boys ranged between 7.0 and 14.0, with a 
mean of 9.46, and a standard deviation of 1.1 (08). The range 
for the one hundred girls included scores from 7.0 to 14.0. The 
mean was 9.94, and the standard deviation was 1.3 (02.). The 
difference favored the girls. The obtained difference between the 
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Teacher Date 


SAMPLE NO. THIS SAMPLE WAS WRITTEN 
BY A (Check One) 


Boy. Girl 
Boy. Girl 
Boy Girl 
Boy. Girl 
Boy Girl 
Boy Girl 
Boy Girl 
Boy Girl 
Boy Girl 
Boy Girl 
Boy Girl 
Boy Girl 
Boy Girl 
Boy Girl 
Boy. Girl 
Boy Girl 
Boy. Girl 
Boy Girl 
Boy Girl 
Boy Girl 
Boy Girl 
Boy Girl 
Boy. Girl 
Boy Girl 
Boy Girl 
Boy Girl 
Boy Girl 
Boy Girl 
Boy. Girl 
Boy. Girl 
Boy Girl 
Boy. Girl 
Boy Girl 
Boy Girl 
Boy Girl 
Boy Girl 
Boy Girl 
Boy Girl 
Boy. Girl 
Boy Girl 
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Figure I. The rating blank used by the teachers in making judgments of the sex 
of the writers of the forty samples of pupil penmanship. 


average quality scores for boys and girls was 0.48, or approximately 
ninety-four per cent of what it should be to insure a complete 
reliable difference. The difference between the average quality 
measure for boys and the average quality measure for girls is 2.81 








708 


times the standard error of the difference. The chances are about 
99.74 in 100.0 that the true difference (the difference between the 
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TABLE I 


Per Cent of Error of Correct and Incorrect Judgments as to the 
Sex of the Subjects 








RATING OF SEX 











Sex 
Sample of First Second 

Number Subject M F M F 
1 M 89.47 10.53 100 0 
2 F 42.11 57.89 30 70 
3 F 26.32 78.95 30 70 
4 M 68.42 31.58 90 10 
5 M 84.21 15.79 80 20 
6 F 10.53 89.47 20 80 
7 F 21.05 78.95 10 90 
8 M 94.74 5.26 100 0 
9 M 57.89 42.11 30 70 
10 M 36.84 63.16 40 60 
11 F 42.11 57.89 40 60 
12 F 5.26 94.74 10 90 
13 M 47.37 52.63 90 10 
14 F 10.53 89.47 0 100 
15 F 78.95 21.05 90 10 
16 F 84.21 15.79 60 40 
17 F 47.37 52.63 60 40 
18 F 68.42 31.58 60 40 
19 M 73.68 26.32 80 20 
20 M 36.84 63.16 90 10 
21 M 57.89 42.11 30 70 
22 M 10.53 89.47 10 90 
23 M 78.95 21.05 60 40 
24 M 100.00 10 90 
25 F 57.89 42.11 50 50 
26 F 26.32 78.95 40 60 
27 M 100.00 100 0 
28 F 15.79 84.21 40 60 
29 M 94.74 5.26 90 10 
30 F 10.53 89.47 10 90 
31 F 31.58 68.42 60 40 
32 F 42.11 57.89 50 50 
33 M 84.21 15.79 70 30 
34 F 15.79 84.21 40 60 
35 F 63.16 36.84 60 40 
36 F 21.05 78.95 20 80 
37 F 31.58 68.42 70 30 
38 M 31.58 68.42 50 50 
39 M 94.74 5.26 70 30 
40 M 78.95 21.05 60 40 








N=19 judges on first judgment; ten, on the second 
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true measures) is greater than zero. On the basis of this finding, 
the girls in this experimental situation write better than the boys, 
but the difference is not an entirely significant (reliable) one.* 

The findings as to the sex of the subjects writing the forty 
samples used as a basis for teachers judgment are given in 
Table I. 

It is interesting to compare these data with the actual facts for 
eight of these samples which are shown in Figure II. The 
samples in Figure II are correctly marked with the school grade 
and the sex of the writer. 

The average correct judgment of sex for the first judgment 
of the forty samples of penmanship was 63.0 per cent correct sex 
identification. This is markedly in agreement with the findings 
of earlier investigators. Broom, Thompson, and Bouton reported 
that a group of twenty-four judges judged the sex of persons 
from their handwriting 68.0 per cent of the instances on the first 
judgment, and 67.7 per cent of the instances on a second judg- 
ment.* Downey notes that the range of accuracy in detecting sex 
from penmanship is between sixty and 77.5 per cent. Kinder 
found a range of from 58.0 and 75.0 per cent accuracy in detecting 
sex from penmanship, with a mean accuracy of 68.4 per cent.® 
Newhall found an accuracy range between 56.0 per cent and 59.0 
per cent, with a mean of 57.0 per cent." 

Summary and Conclusions. The study of sex differences in 
penmanship was conducted in two directions: (1) a statistical 
analysis of differences in penmanship quality between the sexes 


*H. E. Garrett, Statistics in Psychology and Education. New York: Longmans 
Green and Company. 1926. Pages 128-133. 


*M. E. Broom, B. Thompson, and M. T. Bouton. “Sex Differences in Hand- 
writing” Journal of Applied Psychology 13 (1929) p 159-166. 


5J. E. Downey, “Judgments on the Sex of Handwriting” Psychological Review. 
17 (1910): 205-216. 


*J. S. Kinder, “A New Investigation of Judgments on the Sex of Handwriting.” 
Journal of Educational Psychology. 17 (1926): 341-344. 


™S. M. Newhall, “Sex Differences in Handwriting.” Journal of Applied Psy- 
chology. 10 (1926): 151-161. 
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and (2) an attempt to discover whether teachers could detect 
differences in the handwriting of boys and girls. The advantage 
discovered in average penmanship quality favored the girls. The 
differences were of doubtful statistical significance. The conclu- 
sion is that boys and girls should be taught penmanship together, 
and somewhat similar successes should be anticipated for the two 
sex groups from similar types and amounts of instruction. 

The teachers who served as judges in detecting sex from samples 
of penmanship noted the sex of the pupils correctly approximately 
two times in each three instances. This is a close agreement with 
results of previous investigators of the problem. The assumption 
is that there must be some characteristic difference in the results 
secured when boys and girls write. The obvious inference is that 
further detailed study should be made to discover whether this is 
due to training or to differences in anatomical structure due to 
bodily development rate of boys and of girls or of other factors. 
The conclusion, pending such a study has been outlined, is di- 
rectly opposed to that suggested by the first study of sex differences. 
It seems logical that different methods of instruction or different 
amounts of instruction should be given to boys and to girls in 
order to secure uniform results. 





A NOTE ON THE EFFECT OF TEACHING 
ON THE RELIABILITY COEFFICIENT 
OF AN ACHIEVEMENT TEST* 


HERMAN A. COPELAND 


Cincinnati Employment Center 


ELLEY! has shown that since the reliability coefficient 

K (like other correlation coefficients) changes with the range 

of talent, a consideration of the range sampled needs to be 

made while interpreting such a measure of reliability. It appears 

to the writer that if learning an academic subject, for instance, 

either increases or decreases the differences in a group, then the 

reliability of the achievement test, if indicated by means of a 

correlation coefficient, likewise varies (possibly according to the 
formula given by Kelley). 

The following description of achievement tests and their con- 
struction suggests that a decrease in the reliability might be ex- 
pected. “A good test covers only the really important points of a 
subject. .. . The maker of a test comes to a conclusion as to what 
points, in the total amount of subject matter, are of greatest im- 
portance. He then discards those questions that seem of relatively 
little significance and keeps only those which deal with basic 
facts... . The best tests are based on very careful research as to 
the fundamental objectives in the subject concerned, and the ma- 
terial is selected with reference to its importance for these objec- 
tives.” ? Here it is to be noted that an achievement test includes 
only those items which presumably have been learned by every 
one after teaching has taken place. If in reality these fundamental 


*This study was made while the author was University Scholar at Ohio State 
University. 


*T. L. Kelley “Reliability of Test Scores,” Journal of Educational Research, 
1921, 3, 370-379; Statistical Method, 1923; 221-223. 


*S. L. & L. C. Pressey, Introduction to the Use of Standard Tests, 1931, World 
Book Co., pages 8-9. 
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items are generally known, the range of talent is reduced which 
in turn decreases the correlation used to indicate the reliability of 
the achievement test. To observe the accuracy of this analysis 
some reliability coefficients of achievement tests obtained by cor- 
relating odd and even items and applying the Brown-Spearman 
prophecy formula have been examined. 

The first test studied is one containing 41 items of elementary 
statistical procedures required in Educational Psychology and 
devised by Carter whose purpose was “to determine these minimal 
statistics and develop certain tests which will measure the student's 
knowledge of the different terms and his ability to perform various 
statistical computations in a common teaching situation.” * While 
Carter’s work was done carefully, it was preliminary and he re- 
ported no measure of reliability other than that to be expected 
from his method of construction. As shown in line 1 of the 
table he found a mean score of 26.6 and a standard deviation of 
6.7 with 109 students. These were secured by giving the test at 
the end of the first week of the quarter to the students in Educa- 
tional Psychology. Ordinarily approximately the first meeting of 
the quarter is spent in presenting certain tool subjects among which 
are the statistical needs of a teacher and “creating a desire in the 
students to overcome this need by learning these necessary tools” 
and then during the remainder of the week the students “master” 
this material “on their own” by using a self-instructional booklet.* 
While it appears that some students master the material, others 
do not, as is shown by the large standard deviations. In the first 
three lines of the table we have data from Carter and two other 
instructors; means and standard deviations have been found by the 
latter similar to those reported by Carter, and from these we esti- 
mate by the use of Kelley’s formula that Carter would have ap- 
proximated the auspicious reliability of “ninety.” And then the 
test would be adequately reliable for ordinary uses—in fact almost 
good enough for individual prognosis. These last two separately 
observed reliability coefficients have been combined by using 


*H. L. J. Carter, Development of Certain Tests in Minimal Statistics for Teach- 
ers, 1931. M. A. Thesis on file in the Ohio State University Library. 


*L. C. & S. L. Pressey, Methods of Handling Test Scores, 1926, World Book Co. 
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Fisher’s z function ® in order to obtain as accurate a measure of 
the reliability as possible. It is necessary to use z because the dis- 
tribution of r at high values is skewed.® 

During one term the teaching procedure was varied so that 
about four hours were used in presenting and supervising the 
learning of these minimal statistics, and then testing. The decrease 
in variability as well as reliability is shown in line 4 of the table. 
The mean shows that the group was not especially “statistically” 
sophisticated. A subsequent attempt to insure thorough learning 
with a larger group resulted in a larger mean, and slight reduc- 
tion of variability and reliability. To determine if the last two 
values of the reliability coefficient could be due to chance varia- 
tions from the combined value, these correlations were trans- 
formed by using the z function. It was found that the coefficient 
in the fourth line was significantly different from the combined 
value; while that in the fifth line was suggestively, though not 
significantly, different (i.e., the difference is only about three times 
its p. ¢.). 

The second test studied is a multiple response test of 200 items 
used by Smeltzer* and given at the end of one quarter and at 
the beginning and end of the succeeding quarter. Since he used 
an experimental and a control group, these have been tabulated 
separately and designated exp. and con. respectively. At the end 
of the first quarter with the exp. group a high mean, and low 
standard deviation and reliability were found, as shown in line 6; 
while in the next line with the con. group slight differences are 
evident—transforming the correlations into z’s showed the dif- 
ference between the reliability of the end test for the first quarter 


®R. A. Fisher, Statistical Methods for Research Workers, 1930, (third edition) 
London, Oliver and Boyd, pages 163-171. 


*Karl Holzinger, Statistical Methods for Students in Education, 1928, Ginn 
and Co., footnote on page 238. 


*C. H. Smeltzer, An Experimental Evaluation of Certain Teaching Procedures 
in Educational Psychology, 1931. Ph.D. Dissertation on file in the Ohio State 
University Library. Also “Improving and Evaluating the Efficiency of College In- 
struction,” Journal of Educational Psychology, 1933, 24, 283-302. 1 wish to thank 
Professor Smeltzer for permission to use these data which he had collected for 
another purpose. 
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as observed from the two groups to be insignificant (only twice its 
p.e.). Then these reliability coefficients were compared with the 
results of this examination used as a pre-test on the exp. and con. 
groups for the next quarter; these are shown in lines 8 and 9 
respectively. The z of the con. group’s end test in line 7 differs 
from the z of the second quarter con. group’s pre-test in line 9 
by only twice its p. e. and must not be considered as significant. 
The z for the con. group’s end test differs from the z of the exp. 
group’s pre-test; while the z for the exp. group’s end test for the 
first quarter differs from both of the z’s of the pre-test. The 2’s 
observed from the two pre-tests do not differ. The z observed on 
the pre-test with the exp. group differs from that on the end test 
with the same group (line 10). The z observed with the con. 
group on the pre-test does not differ from that observed at the 
end with the same group (line 11). The two reliabilities observed 
at the end do not differ significantly. (In the last four lines, since 
the same subjects took both pre- and end-tests, the correlation 
term in the formula for the probable error of the difference should 
have been evaluated—but it has not been attempted. This would 
further emphasize any differences found as scores on the pre-test 
correlate with scores on the end test .50.) 

The third test studied is a judgment test of applied informa- 
tion also used by Smeltzer.* From lines 12 and 13 it can be seen 
that at the end of the first quarter the reliability of the test is 
low, and there is no significant difference between these two re- 
liability coefficients. When the same test was given as a pre-test 
at the beginning of the next quarter a surprising increase in 
reliability is shown in lines 14 and 15 of the table. Surely no one 
can deny that this test is reliable enough for individual prognosis, 
yet at the end of the quarter and with the same students disap- 
pointingly low reliability is found. Of this Smeltzer has written, 
“When the judgment problem test (twenty problems in length 
with one hundred items to be evaluated) was administered at the 
end of the course, the coefficient of reliability was .76+..02 [Exp. 
and con. groups have been combined]. This coefficient was ob- 


°C. H. Smeltzer, “Objective Measurement of Applied Information,” Journal of 


Applied Psychology, 1933, 17, 765-771. 
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tained by correlating the scores on the ten odd numbered prob- 
lems with the ten even numbered problems and corrected by the 
Brown-Spearman formula. If this test had been doubled in length, 
so that the number of items to be evaluated compared with the 
multiple choice questions [200 as mentioned above], the coeffi- 
cient of reliability would have been raised to .86+.01 according 
to the Brown-Spearman formula. When the test was administered 
at the beginning of the course the coefficient of reliability was 
94+ .004. It is most difficult to account for the decrease in the 
coefficient from the beginning to the end of the course. The only 
tentative explanation that can be given now is that the students 
at the beginning of the course had to weight the solutions on 
the basis of past experience, which was more or less a trial and 
error process.” The author doubts whether this would increase 
the reliability because reliable tests supposedly are not answered 
on a trial and error basis. 

In every case for both of Smeltzer’s tests the reliability observed 
from the end test differs from that of the pre-test for the experi- 
mental groups; in only one case does the reliability of the control 
group differ more than that of the experimental group. Smeltzer 
has already shown that more learning occurred in the experimental 
than in the control group. 

A brief examination of the literature on the reliability of mazes 
revealed many experiments but lack of published data and large 
errors prevented any analysis, although one paper gave some 
evidence similar to what has been presented here. 

While Kelley has pointed out that the shrinkage in the correla- 
tion coefficient was to be expected, and substitution in his formula 
which includes four observed measures,® all of which contain 
sampling errors (making verification difficult), shows some agree- 
ment with the observed data, he does not indicate that this change 
in the range of talent may result from learning. 

After observing that a test whose reliability has been found 
to be near .90 drops to as low as .48 when thorough learning takes 
place, a second test’s reliability in similar circumstances drop from 


*Consider, too, Holzinger’s criticism of some of the assumptions underlying 
its derivation. Journal of Educational Research, 1921, 4, 237-239. 
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.94 to less than .90, and that of a third from .94 to .76, it would 
seem that if the achievement test “covers only the really important 
points of a subject,” and if these are really learned, then the 
variability necessary for high correlations is so reduced that the 
test appears to be unreliable. Further it would appear that a 
reliability coefficient of an achievement determined after maximal 
learning had taken place would be a minimum value. 











Section N M o Reliability sz 

1. Carter 109 26.6 6.7 (.90?) 

2. Cplnd 79 29.8 7.6 0.92+.01 1.5890 
3. Bhns 33 31.2 6.9 0.90+.02 1.4722 
4. Cndsum 27 34.4 2.8 0.48.14 0.5230 
5. Cndwin 51 36.4 4.6 0.84.03 1.2212 
6. Expend? 76 169.6 13.3 0.85.02 1.2562 
7. Conend' 88 158.0 18.4 0.90+.01 1.4722 
8. Exppre? 82 94.2 21.6 0.95+.01 1.8318 
9. Conpre? 106 94.3 24.8 0.93+.01 1.6584 
10. Expend? 82 167.5 24.5 0.89+.01 1.4219 
11. Conend? 106 154.5 18.2 0.90+.01 1.4722 
12. Expend? 76 58.2 11.9 0.47+.08 0.5101 
13. Conend* 88 54.6 17.5 0.57+.06 0.6475 
14. Exppre? 82 29.8 17.9 0.95+.01 1.8318 
15. Conpre? 106 22.7 20.3 0.93+.01 1.6584 
16. Expend * 82 58.5 15.3 0.81+.03 1.1270 
17. Conend? 106 45.0 15.7 0.70.04 0.8673 








PITCH DISCRIMINATION AND FRENCH 
ACCENT ON THE HIGH SCHOOL LEVEL 


EMILY S. DEXTER 
Agnes Scott College 


N EARLIER study? on the relation between pitch dis- 

A crimination and accent in modern languages among col- 

lege students led to such consistent and positive conclusions 

that it seemed desirable to pursue the question on the high school 

level. Since French is the only modern language taken by any 

large proportion of students in this region this investigation deals 
only with it. 

The Seashore test for pitch discrimination was given to 90 
students in one local Girls’ High School, to be referred to hereafter 
as School A, and to 455 in another, to be referred to as School B. 
Ratings were also obtained from their teachers as to the pronuncia- 
tion and accent of the 90 girls in School A and the 425 in School 
B who were studying French. These ratings were in five degrees 
of excellence: 1 very high, 3 average, 5 very low, with 2 and 4 as 
intermediary steps. The scores made by the girls of School) A 
on the Henmon-Nelson Tests of Mental Ability, and the IQ 
registered in the office files for the girls of School B were used as 
the measures of intelligence. 

The findings of this study are gratifyingly similar to those of the 
earlier investigation. High school students to an even greater 
extent than college students tend to be rated high, average or low 
in French accent according as their ability to discriminate pitch 
varies from high to low. 

In order to compare the parts played by intelligence and by pitch 
in the securing of a good accent rating in French, median scores 
on intelligence and pitch were figured for each rating. Table I, 


*Emily S. Dexter and Katharine T. Omwake, “The Relation between Pitch 
Discrimination and Accent in Modern Languages,” The Journal of Applied 
Psychology, XVIII, 267-271 (April, 1934). 
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in which these medians are given, indicates very clearly the trend 
found. Those rated 1 in every case score higher in intelligence 
than do any others, but the other four ratings show no uniform 
gradations in intelligence medians. In the matter of pitch, how- 
ever, there is a consistent trend downward as the accent rating 
becomes lower. In other words, the rating seems to vary not so 
much or so regularly with intelligence as with ability to discrimi- 
nate pitch. The one exception to this is the case of the juniors in 
School A in which both intelligence and pitch decrease as the rat- 
ings in accent decrease. In Table I are also to be found the 
percentile ranks assigned each pitch score by the Seashore standards, 
which magnify the differences. 


TABLE I 
Median scores on intelligence and pitch for each accent rating 


Rating in French accent 

















1 2 3 4 5 
School A Seniors 
Median intelligence score 70 63 51 65 52 
Median pitch score 83 77 71 67 64 
Percentile rank on pitch 63 32 17 ll 8 
School A Juniors 
Median intelligence score 70 58 50 49 42 
Median pitch score 76 72 61 60 54 
Percentile rank on pitch 29 19 5 5 2 
School B Seniors 
Median IQ 126 120 116 120 115 
Median pitch score 85 83 76 73 70 
Percentile rank on pitch 76 63 29 21 15 
School B Juniors 
Median IQ 125 123 117 122 112 
Median pitch score 81 81 76 73 69 
Percentile rank on pitch 50 50 29 21 13 





That neither intelligence nor pitch discriminative ability seems 
to be the sole determining factor in accent is further indicated by 
the fact that the correlations between pitch and accent rating, and 
between intelligence and accent rating are about equal, although 
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those for the former pair are slightly higher than for the other. The 
average correlation between pitch and accent rating is .639 and 
that between intelligence and accent rating is 592. These are 
calculated by the method of contingency. The correlations be- 
tween intelligence and pitch, as would be expected, run low, 
averaging .197. These are calculated by the product-moment 
method. See Table II for a summary of the correlations. 


TABLE II 


Intercorrelations between pitch, intelligence, and accent rating 


School A Seniors School A Juniors School B Seniors School B Juniors 


Intel. Accent Intel. Accent Intel. Accent Intel. Accent 
Pitch 136 581 339 .681 091 691 223 594 


Intelligence 648 686 556 477 





In order to get one more angle from which to see whether the 
apparent relation between ability in French and pitch discrimina- 
tion is a real one, a comparison was made between two groups as 
follows: Group I was made up of all seniors who as freshmen had 
failed French after attempting it for a year or less. The median 
IQ of this group was 109, a lower IQ than occurs as median in any 
group in Table I. Group II was composed of all seniors with an 
IQ of 109 or less who had had two or more years of French. Their 
median IQ was 103. The upper limit of this group was arbitrarily 
set at the median of Group | in order to insure that the factor of 
intelligence should not be the differentiating one, and to offset any 
possible contention that the low IQ was the predominating cause 
of failure in French. There were 15 in each group. The pitch 
median for Group I was 70, i.e., percentile 12, while for Group II, 
those with two or more years of French, the pitch median was 84, 
i.e., percentile 70. Of course neither group was large but there 
were no others eligible. Since the average IQ of Group II was 
definitely lower than that of Group I, it seems safe to conclude that 
a student with somewhat low intellectual ability for high school 
work and without good pitch discrimination is unlikely to succeed 


in French. See Table III. 


*H. E. Garrett, Statistics in Psychology and Education, pp. 195-203. 
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TABLE III 
Comparing pitch discrimination of students failing French (Group 1) 
with those having studied it two or more years (Group II) 


IQ IQ Pitch Range of Pitch Pitch 
Range Median Range Middle 50% Median Average 
Group I 90-130 109 50-83 62-78 68 69 
Group II 94-109 103 61-91 80-86 84 80 


Two of the seniors with French referred to in Table HI were 
very low in pitch, 61 and 62 (rated 5 and 4 respectively). It is 
their scores that bring the pitch average so much below the pitch 
median. It is to be observed that the lower extreme of the middle 
50 per cent of Group II is higher than the higher extreme of the 
corresponding range of Group I, as also their median is higher 
than the highest score made by those failing French. That the 
difference of 11 points between the two averages is a reliable one 
is indicated by the fact that the chances are 100 in 100 that the true 
difference is greater than zero; i.e., complete certainty with a little 
to spare. When the difference between the two medians is tested 
for reliability it is found to be 60 per cent greater than is needed 
for complete certainty.® 


Conclusions: 
1. There is a good correlation between intelligence and accent 
rating in French, averaging 592. 


2. There is a good correlation between pitch discrimination and 
accent rating, averaging .639. 


3. Ability to discriminate pitch contributes as much as intel- 
ligence to the securing of a good French accent. 


4. Comparatively low intellectual ability for high school work 
accompanied by good pitch discrimination seems to result in 
reasonably successful work in French; whereas correspondingly 
low intelligence accompanied by low ability to discriminate pitch 
leads to failure in French. 


* Ibid, pp. 128-137. 
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NEWS AND NOTES 


It is with great reluctance that this JourNAL announces an increase in 
the subscription price from $5.00 per year to $6.00, beginning with 
1935, Volume 19. In part this is due to the fact that we have changed 
printers following the enactment of the National Recovery Act. Nearly 
all of our subscribers no doubt will understand why this is true. It is 
due of course chiefly to the increase in the cost of labor. We changed 
printers in order to improve the quality and style of the printing in 
the JournaL. Most of the many judges from whom we have heard 
readily admit that we have been successful. In 1927 the JournaL 
changed from a quarterly to a bi-monthly. This was done without in- 
creasing the subscription price. 

The average number of pages per volume for the seven years prior 
to 1927 was 460. The average number of pages for the seven years 
prior to 1934 was 660. We have changed printers more than once 
with the hope of reducing the charges which we yet have to make for 
offprints, engravings and special type matter to the contributors. We 
believe that the great majority of subscribers will desire, on knowing 
the facts, to share in reducing the costs of publication to those who 
have already spent their time and energy in making the scientific in- 
vestigations represented in the manuscripts printed in our JouRNAL. 

We have made computations to find out the median price of twenty- 
five other psychological journals. This is found to be approximately 
$6.00. Our schedule for the coming issues of our JourNat calls for 
the printing of between 700 and 800 pages per volume. The median 
number of pages printed in these twenty-five other journals is about 
500 per volume. We believe that if all the facts were known our 
subscribers would by willing to pay even a larger subscription price 
than the one which we have decided upon for the present. 


Over 780,000 or 2 per cent of American children under 14 are 
exceptional children and constitute a serious problem demanding 
special study and attention in the interest of America’s future. The 
problem child, the slow child, the child with reading and speech 
difficulties must get a chance in present day America. His only chance 
is under special scientific guidance and instruction. 

This was brought out at the First Institute on the Exceptional Child 
at The Woods Schools of Langhorne, Pennsylvania, at which promi- 
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nent psychiatrists, psychologists, neurologists, pediatricians, and edu- 
cators met Tuesday, November 13, to discuss the exceptional child. 

Formerly located in Roslyn, Pa., The Woods Schools have long 
been prominent in Bucks County, Pa., and occupy the estates of the 
late Judge John M. Patterson, of Howard Reifsynder, and of the 
former Julia Bullitt in Langhorne. Mollie Woods Hare, of the Schools, 
was for many years active in the movement for special classes in the 
public schools of Philadelphia, and in court work with juvenile 
delinquents. 

Announcement was made of the establishment of a Child Research 
Clinic of the Woods Schools to give to the general public information 
based on the Schools’ broad experience in the field of the exceptional 
child. 


The Printing Office of the Yale University Press has just announced 
that the Yale Clinic of Child Development in cooperation with Erpi 
Picture Consultants, Inc., 250 W. 57th Street, New York City, is pre- 
pared to supply to those interested the Yale Films of Child Develop- 
ment. These films in fact are scientific and educational, and are 
published in sound and silent versions. They portray the splendid and 
thorough work of Dr. Gesell and his co-workers almost without an 
equal as scientific studies of the life and growth of the human infant. 
The records are systematic and normative and have been obtained over 
long periods of time. They cover such fields as posture, locomotion, 
prehension, manipulation, attentional regard, exploitive and adaptive 
behavior; also sleep, waking, feeding, bath, play, bodily activities, and 
social behavior. Norms are established for comparative and diagnostic 
studies of such problems as individuality, personality characteristics 
and constitutional trends. 


The Southern California Branch of the American Eugenics Society, 
through its President, Dr. Paul Popenoe, has sent to us its first News 
Letter, Vol. 1, No. 1, October, 1934. Beginning in 1922, this national 
Society was incorporated in 1926. In 1929 this California Branch was 
established. The latter holds monthly meetings, conducts essay con- 
tests, protects from attack the three-day-intention-to-wed law, and de- 
votes attention to eugenical legislation, notably sterilization. Inasmuch 
as California leads all other states combined in such legislation there 
are probably many of our readers who will be keenly interested in re- 
ceiving copies of these News Letters. Those interested may write to 
either the President or S. Wayne Evans, Secretary, Room 331, 607 
S. Hill Street, Los Angeles, California. 
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BOOK REVIEWS 


Starch, Daniel. Faith, Fear and Fortunes—Why We Have Booms and 
Depressions. Must we endure them again? New York, R. R. Smith, 
1934, iii-226. 

It is about time we had a book on the psychology of the depression. 
The result will impress some readers as rather meager, centering as it 
does around the one main point implied in the subtitle but it may be 
that that is about all there is to it. If the book merely calls attention, 
in strategic places, to the importance of psychological factors it may 
have fulfilled its mission. A detailed, constructive, psychological pro- 
gram without further consideration and investigation might be pre- 
mature. The present book represents at least a start in the right 
direction. 

The main theme of the book is that we have overlooked the psycho- 
logical factors in booms and depressions, particularly the over- 
enthusiasm in the former and the undue fear in the latter. This theme 
is woven through a somewhat historical account of depressions from 
Pharaoh’s time to the present. The author quotes from business and 
economic journals showing a failure to recognize these psychological 
factors and the persistence of a stubborn hope. He shows how we de- 
pend upon forecasters and then when these prophets fail, confusion 
results. When large numbers focus their attention on one thing, such 
as the stock market, we have essentially mob action. Some social 
psychologists may object to the use of this term when the mob is not 
actually in personal contact. His point is that under these circum- 
stances we are apt to go somewhat more to emotional extremes. The 
whole matter of supply and demand he considers seventy-five per cent 
psychological and indicates that if we were entirely rational, prices 
of stocks, for instance, would bear a certain fairly constant relation to 
their earnings. However, with buyers going to extremes of optimism 
and fear, we have extreme fluctuations in the market. The author’s 
reasoning on these points seems essentially sound from a psychological 
standpoint but the reviewer is not equipped to evaluate the economic 
aspects and their relative importance in comparison with the psycho- 
logical. 

Some interesting comments are given on the psychology of the 
New Deal. The various alphabetical agencies have certain mnemonic 
value and the Blue Eagle is essentially a trade mark. The various 
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NRA parades were psychological means of selling people the general 
idea. The chief executive’s personality—the poise and cheerfulness 
usually manifested in photographs, his informality in press conferences, 
his accessibility to congressmen and his simple and sympathetic dis- 
cussions over the radio all contributed to a favorable public attitude. 
When business thought there was some relief from the strain there 
was an overflow of confidence which led to rather wild activity on the 
Exchange followed by a reversal as men began to worry about seeming 
bureaucratic action. The complications of the securities act produced 
some confusion and fear. The battle among the experts as to the 
monetary policy added further confusion and indecision to the 
average business man who cannot appreciate the intricacies of the 
problem. 

In the last chapter the author attempts constructive suggestions for 
the prevention of difficulties like the present. Among his suggestions 
are the following: (1) Mass education in psycho-economic principles. 
Inasmuch as fear is based on ignorance, such a national program of 
education might produce a group of people who acted from reason 
rather than emotion in times when excess of emotion might be dis- 
astrous. (2) A supreme economic council to “formulate a sound 
psycho-economic philosophy,” serving without compensation and 
largely in advisory capacity. The author goes so far as to suggest not 
only the makeup but the personnel for such a council. He lists the 
ex-president, 45 business leaders, 9 economists, 5 psychologists, 4 social 
and religious workers, 2 political scientists, 3 labor representatives, 
2 lawyers, 2 sociologists, and 2 educators. The psychologists sug- 
gested are Angell, Scott, Seashore, Terman, and Thorndike—a selec- 
tion with which their colleagues would not quarrel. (3) Limiting 
value of collateral for speculative loans. (4) Higher standards for 
banks, because failure of numerous banks even though they are small 
has a bad psychological effect when the rest of the country hears about 
the number that has failed. (5) Privately-planned cooperation in man- 
agement. Here the author stresses more consideration of what the 
prospect wants,—the same keynote that has been sounded by Link 
and other psychologists. (6) Creating new things to do,—which is 
not an original point at all. (7) Having thinkers in every business 
who will take the long-range point of view and maintain a somewhat 
detached philosophical attitude. 

In the last few pages the author again stresses the desires and emo- 
tions as fundamental to the whole problem. He feels that profit is 
still the most compelling motive, that if there is lack of confidence and 
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fear regarding the future, no amount of credit will be effective. The 
desire to buy is more potent than the ability to buy and the book closes 
with a reiteration of the same thesis that booms and depressions are 
phenomena of mass psychology. 

The book will serve its purpose in calling attention to some of the 
psychological factors in the present situation of the world as well as the 
failure of the economists to agree with one another or to present 
wholly satisfactory solutions of the difficulties. Whether the remedies 
which Starch suggests would take care of the matter is debatable. He 
is thinking not merely of getting out of the present difficulty but rather 
of avoiding its recurrence. To this end his suggestions of an educa- 
tional program and a group of experts somewhat more representative 
than the present Brain Trust seem to have some merit. As a con- 
tribution to applied psychology the book may not loom very large but 
it states a problem. The reviewer shares what he considers the author’s 
implicit hope that ere long psychologists may sit around the council 
table in high places. 

Harovp E. Burtr 
Ohio State University 


Raymond C. Perry. A Group Factor Analysis of the Adjustment 
Questionnaire. Southern California Education Monographs, No. 
5. Los Angeles: University of Southern California Press, 1934. 


93 pages. 


The monograph is an abridgment of a doctorate dissertation at the 
University of Southern California. Briefly, the problem which Dr. 
Perry attacked is indicated by the following question: how many 
independent factors are being measured by the adjustment question- 
naire, and in which questionnaires are these factors found? Because 
the field of personality testing is so large, an effort was made to limit 
the investigation to this major problem and to attempt a solution 
through the use of appropriate testing instruments in the field of the 
study. The attack was entirely of a statistical nature, involving the use 
of correlation, Thurstone multiple factor analysis, and Spearman tetrad 
methods. 

The author tested a total of 328 students in the Long Beach, Cali- 
fornia, Junior College, with the following instruments: the Thurstone 
Psychological Examination, the Laird Personal Inventories B-2 and 
C-2, the Allport A-S Reaction Study, the Bernreuter Personality In- 
ventory series, the Iowa High School Content Examination, and the 
Pressey X-0 Tests. The author reports that the experimental group 
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was normal as compared to groups used in the standardization of the 
tests and measures used in the study. 

The findings of the study appear in Chapter V as answers to eighteen 
specific questions, together with an answer to the major problem 
question. Stripped of technical verbiage, the findings of signifi- 
cance are as follows: (1) There are four independent variables 
measured by the tests used in the study. (2) The groups found are 
(A) a group ordinarily called either “introversion” or “neurotic tenden- 
cies” which the study shows to be synonymous; (B) a group found 
in tests claiming to measure “sufficiency,” “dominance,” or “ascend- 
ence;” (C) a group found in tests claiming to measure “dominance” 
and “ascendence;” and (D) a group found in tests claiming to measure 
“intelligence” and “achievement.” “Sufficiency” enters into the 
makeup of this factor. The author identifies the instruments which 
measure these groups best. 

This study is of significance because of its scope, and because of 
the care with which it was done. It has long been known that per- 
sonality traits are important in affecting achievement, since personality 
is a driving force causing gifted persons to use the ability which they 
possess productively. The study should be of considerable interest to 
counsellors in secondary schools and colleges because of the nature of 
its findings. 

Perhaps a word should be said about the volume itself. It is at- 
tractively bound. The printing is excellent. The context is supported 
by footnote references at appropriate points, and the volume has a 
considerable bibliography. Apparently the same care was exercised 
in preparing the monograph for publication as was evidenced during 
the study itself. 

M. E. Broom 
State Teachers College 
San Diego, Calif. 


Ives Hendrick. Facts and Theories of Psychoanalysis. New York, 
Alfred A. Knopf. 1934. 308p. $3.00. 


“Psychoanalysis is the science of the unconscious functions of the 
mind and personality, developed by Sigmund Freud and his stu- 
dents.” Thus in the first sentence of his book, Dr. Hendrick com- 
pletely delimits the field he is going to discuss. In the preface he says 
the purpose of the book is “informative, to assist those with an intelli- 
gent interest to understand how the analyst himself regards his own 
work, and why.” It is not for the reviewer to question the merits of 
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the field delimited, but to judge how well the purpose of the book 
is fulfilled. 

The fulfillment of the stated purpose is not an easy task. For ex- 
ample, Part I, concerned with the Facts of Psychoanalysis, considers 
the Unconscious, punishment phantasies, and psychosexuality. Here 
there is much case and anecdotal illustration. But even the author 
must admit in the last pages of the section that the raw materials of 
psychoanalysis are the “utterances, actions and feelings of patients” 
which for “coherent explanation” require “threading on propositions 
which, in the strictest sense, are not incontestable facts.” The reader 
would like to know what are “facts” for psychoanalysis—the proposi- 
tions which comprise the subject matter of the section, or the observed 
data which are introduced only as illustration. 

In Part II, dealing with Psychoanalytic Theory, we have a well con- 
densed and well organized presentation of Freud’s theoretical positions 
at the various stages in the development of psychoanalysis. It is one of 
the clearest expositions of these theories that has yet appeared in Eng- 
lish. It should certainly be read by all psychologists who are not al- 
ready sufficiently prepared to answer intelligently the questions of 
students which are inevitably asked in even elementary courses. 

The author disclaims any intention of writing a textbook for train- 
ing psychoanalytic therapists, and in the last section he outlines the 
training program necessary, yet one’s reaction to Part IV on Psycho- 
analytic Therapy is that it is not complete. True, much is said about 
transference, the transference neurosis, and resistance; about symptom- 
atic indications for analysis and the analysability of patients; there 
is a discussion with a statistical table of therapeutic results of analysis 
and mention is made of the analysis of children. But one wonders 
about mechanical details of the analysis—times, keeping of records, 
how much the analyst has to say and soon. These things are probably 
to be known only by the elect. 

Chapter IX, comprising Part IV, the present status, gives a brief 
account of the spread of the professional movement; of the educational 
requirements for therapists; and of the relation of psychoanalysis 
and psychiatry. Jung, Adler, and Rank are spoken of briefly only in 
their relation to Freud. A glossary of psychoanalytic, psychiatric and 
medical terms, and a selected, annotated bibliography for further 
reading add considerably to the value of the book for the lay reader. 

In answer to the question “Has this book fulfilled its purpose?” 
the reviewer, at least, would unhesitatingly answer, “Yes.” For a 
brief, non-technical presentation of a field which has a formidable 
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literature, much of it controversial, this book is an excellent piece of 
work. 
C. M. Loutrtir 


Indiana University 


Studies in the Psychology of Art. Christian A. Ruckmick, editor and 
Norman C. Meier, director. University of Iowa Studies In Psy- 
chology, 1933, No. 18. Psychological Review Co. Princeton, N. J. 
188 pp. 


The eleven studies in the Psychology of Art are the result of work 
done at the University of Iowa under the direction of Dr. Norman C. 
Meier. Eight studies were master’s theses, two were theses for the 
doctorate and one a post-doctorate investigation. 

Because the problems of art talent and emergence are relatively new 
most of the investigations are of a preliminary nature with tentative or 
negative results. The general confusion in psychology regarding the 
meaning of “intelligence” is increased by adding such concepts as 
“aesthetic intelligence.” 

The studies by Daniels and Jasper indicated that pre-school children 
respond to such art factors as symmetry and graphic rhythm, proving 
that these art elements emerge early in the life of the individual. 
Whorley developed a test of compositional unity employing various 
groupings of Plastilina trees, bird-bath and an entrance. The relia- 
bility of the test was .60 while no data were given regarding validity. 
Williams constructed a test of color harmony by having children pick 
out from various colored scarfs, one that best matched various colored 
dresses on a doll. This test when applied by Walton to 600 children 
and adults proved to have no validity. Grippen made a close study of 
four talented and five artistically disinterested children, all under eight 
years of age, with regard to creative imagination and concluded that 
artistic talent is evidenced by creative rather than imitative productions, 
criticism of own work, higher intelligence, better attention and greater 
perseveration. Dow studied the playground behavior of six artistically 
superior and five artistically inferior children, the differentiation be- 
tween the two groups being made by tests, teacher estimates and art 
exercises, and found that on equipped playgrounds artistic children 
show more response to physically inactive play materials while on un- 
equipped playgrounds they are more active and sociable than non- 
artistic children (p. 90). Rodgers made a study of twenty-seven ar- 
tistically superior and twenty-six artistically disinterested children and 
attempted to determine the influence of environment upon each. By 
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means of check lists, interviews and photographs the artistic influence 
of the home environment was estimated and found “not a vital factor 
in the explanation of variation in artistic competence of children.” 
(p. 106). Tiebout purported to discover what psychophysical func- 
tions differentiate the artistically superior from the artistically inferior 
child. Sensory, motor, perceptual, memory and imaginative tests were 
given to eleven artistically superior and twelve artistically inferior 
children. Significant differences were found between the two groups 
with the artistic group excelling in: observation, recall, imagination, 
originality, form discrimination and feature discrimination. Dreps 
also attempted to discover some psychophysical capacity (psycho- 
physical is taken in a very wide sense in both of these studies) from 
tests of aesthetic judgment, imagination, visual memory, motor control, 
neurotic tendencies and others. Twenty-seven subjects were divided 
into three groups corresponding to three levels of artistic excellence. 
Although no reliabilities of the differences between the groups are 
given it is concluded that there is no significant superiority in favor of 
either group in any of the tests given. Jacobson studied aesthetic 
factors in costume designs and found the same factors present as seem 
to have been found in other forms of art, namely, proportion, balance, 
rhythm and emphasis. 

Altogether one feels that a beginning has been made in the subject 
of art analysis and much valuable preliminary work has been accom- 
plished. Further work with more cases, more reliable and valid tests, 
and more refined methods should lead to important results in this 
important and attractive new field of investigation. The University 
of Iowa Laboratory has been a pioneer in initiating and developing 
the fertile fields of research in music and art. Other contributions are 
promised soon. 

RaeicH M. Drake 
Wesleyan College 
Macon, Georgia. 


Kellogg, W. N., and Kellogg, L. A.: The Ape and The Child. Whitt- 
lesey House, McGraw-Hill Book Co., New York and London, 1933. 
pp. 341, 20 Figures and 80 Photographs. 


This attractively bound, well-printed and illustrated volume is an 
interesting but scientific account of what happens to a chimpanzee 
female child, aged 7%, months, when brought to live with a human 
family one member of which, Donald, is only about 24% months older 
than the young ape. Study of this book reveals that, although the 
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purpose of the authors was avowedly to make an incidental rather than 
a systematic investigation of environmental influence upon early be- 
havior, valuable contributions are made to several fields of modern 
psychology—comparative (animal), child, evolutional, genetics, ex- 
perimental and social. Perhaps this well-planned study is even more 
so because the ready enthusiasm of one author was so well checked 
with resistance from the other. 

Chapter I, An Experiment Outlined—includes a brief statement of 
wild traits possessed by such children as the “Boy of Aveyron,” Kasper 
Hauser and the “wolf girls” of India. While it is recognized that 
these children may have been mentally deficient prior to their life in an 
animal environment, the provocative question raised is, “Why not give 
one of the higher primates exactly the environmental advantages which 
a young child enjoys and then study the development of the resulting 
organism?” This entirely human environment is to be psychological 
as well as physical. Once more the age-old problem of Heredity and 
Environment, Nature and Nurture, is attacked in this instance by 
methods of observation and experiment scientifically controlled. 

Thirty-one body measurements on the ape and the child showed 
relative rates of maturation of 19 and 11 per cent. The former was 
superior in body dexterity and muscular coordination. Gua, the ape, 
was as shown by X-ray photographs the equal of a child of more than 
twice the boy’s age. The latter had a longer reaction time. Careful 
observation and measurement of reflexes as bits of innate behavior 
pointed to differences that are probably inborn and closely correlated 
with more matured structures, for example, the backward movement 
of the pinna of the ear in the ape at a harsh shrill noise. 

With the same careful attention as bestowed on the average child 
Gua showed only a little more susceptibility to colds or illness. After 
drinking or eating from her spoon or cup she wiped her mouth with 
the back of her hand or forearm and there was some evidence of 
thumb sucking! In drowsiness and sleep her behavior was so char- 
acteristically human as to suggest a close bond with mankind. 

Both the child and the ape displayed on repeated later testing prefer- 
ential tendencies for one hand or the other; one foot or the other. Gua, 
like adult apes, was inferior in finely coordinated finger movements. 
Her finest coordinations were with the lips. There were a few in- 
stances in a variety of tests such as picking up a dime in which she 
achieved an approximation to the human thumb-and-finger pincer 
movement. At 9 months she almost equalled Don at 184 months in 
removing a skull-cap from her head. In walking upright Gua ex- 





BOOK REVIEWS 731 


hibited so many differences and difficulties as to lead these authors to 
conclude: “If we are willing to agree that bodily build is hereditary, 
then there is no escaping the argument that walking is ‘native’ to the 
extent that it is influenced by the size and proportions of limbs. . . . 
To say, however, that the behavior or the movements or the activity 
of walking or climbing itself is inherent is, we think, going too far.” 
This chapter (IV) as well as many others is well illustrated with 
photographs and drawings. 

Gua showed early evidence of dislike for intense visual stimuli. 
She observed carefully small objects while Don appeared to perceive 
larger general situations. To both two-dimensionality was something 
new. Gua’s jumping accuracy was proof of ability in distance percep- 
tion. She was excellent in responding to weak sounds and superior 
to the boy in going blindfolded to the source of sound. Gua’s sense 
of equilibrium was superior. She at 14 months spontaneously played 
at whirling and spinning. Don was both more forceful as well as more 
differential to separate taste stimuli. Interestingly enough the ape 
employed olfactory stimuli to identify objects and individuals but was 
never observed to follow a scent. So-called pleasant odors (perfume) 
were less pleasant to her than to the boy. She smiled and laughed 
more than the boy on being tickled and during the nine months her 
sensitivity so increased that she was frequently seen to tickle herself 
and laugh as a result. She was especially sensitive to temperature 
changes, being more disturbed by temperatures below average. Para- 
doxically she did not vocalize in pain as did the boy. 

The major part of waking time for both was spent in behavior only 
to be classified as play. The ape in two weeks deliberately learned 
to throw objects over the edges of her high chair. She played with her 
feet like a child. A self-play of both was getting wholly or partially 
inside of boxes, baskets, etc., Gua’s play more often than Don’s was 
based on stimulation of the inner ear—whirling, hanging. The boy 
differed significantly in explorative and manipulative play with new 
toys. Imitative play was less clearly pronounced in the ape. His 
direct mimicry of her was probably the most convincing evidence. 

From the first mutual curiosity and interest in each other was more 
marked in the boy. At their second meeting the ape bestowed ex- 
ploratory kisses on the face of the boy. The ape’s reception of adults 
was more timid and distrustful but probably as women’s skirts were 
the more readily clung to she was usually less at ease with men. To 
explain the ape’s seeking out and dependence on social and affectionate 
behavior the authors emphasized both her physical and psychological 
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dependence in her strange environment. Such was her docility and 
cooperation that she proved the easier experimental subject to handle. 
In obedience tests she gave the impression of change of behavior only 
to satisfy her superiors. Subsequently she would deliberately go 
behind objects when some secret sin was to be committed. Each of 
these immature organisms demonstrated clearly the possession of an 
elementary capacity to understand social situations. 

Emotional behavior, pleasantly toned, was in evidence in the un- 
vocalized laughter of the ape at about 8 months while the boy already 
had a fullfledged laugh response. At first tickling, then being whirled, 
later feints at tickling and finally spontaneous play wholly by them- 
selves elicited genuine laughter. Emotions, unpleasantly toned, in the 
ape were most frequently fear, the tantrum being the most distinctive 
feature. Loss of support and being left alone were the chief stimuli 
of fear. Thirteen chimpanzees were tested with toadstools and other 
control objects to determine the cause of Gua’s terror of toadstools. 
The final word is that this fear is a special case of the ape’s wariness 
of the strange and unknown. Gua was more agile, also more cautious 
and more jealous than the boy. 

The bashfulness of the boy if asked to perform before strangers had 
no recognizable counterpart in the ape. She had the more violent 
appetites and emotions, was coarser and more elemental. 

In the study of learning systematic drill was purposely avoided. 
At 10% months the ape learned to unhook window screens and open 
a door by turning a knob. Both subjects early gave evidence of typical 
psychological conditioning. In learning to remove a loop from the 
hand the ape showed considerable superiority. The child adapted 
more rapidly to the changed element in the foot-in-loop test. In the 
suspended cookie test as well as the hoe experiment in which later in- 
struction is given to each by the experimenters the ape shows herself 
superior in rate of learning. The child, however, is a more versatile 
and continuous imitator, the implications of the word “ape” to the 
contrary notwithstanding. 

Memory and recognition were tested by a modification of Hunter’s 
delayed reaction method. This and the response to vocal commands 
incline the authors to conclude that the child had a better auditory 
memory while the ape was superior in muscular or kinaesthetic mem- 
ory. If her human associates donned new garments the ape for some 
weeks in the changed environment found it impossible to recognize 
them. 

Intelligent behavior consists of sudden insight into the solution of 
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the problem set as shown by change in the attack. The ape was 
ordered repeatedly to sit on a stool. She very much wanted to be 
near the experimenter. After being caught several times she abruptly 
pushed the stool to his side. In detailed detour experiments each 
subject had about as many correct solutions but the boy revealed similar 
indecision much less frequently. From monthly applications of 
Gesell’s Tests for Pre-School Children the final score was for the boy, 
23; for the ape, 15. Curves for each successive monthly testing are 
strikingly parallel; the ape slightly superior at the beginning; the child 
slightly to somewhat so from the second month. 

Both subjects were susceptible to praise as a motivating factor; much 
more so than to punishment or blame. 

The difference in vocal communication was in favor of the child. 
For a time it seemed that he too employed voice intonation, unarticu- 
lated, largely as an emotional reaction. He was superior in vocal 
imitation. The ape’s vocalizations are classifed into four main groups: 
1. The Bark; 2. The Food-bark; 3. The Screech or Scream; 4. The 
Oo-oo Cry. From daily attempts for several months to train the ape 
to say such words as “papa” the authors conclude that it is unlikely 
any anthropoid ape will ever be taught to say more than half a dozen 
words. Based on the responses of each through the entire nine months, 
the boy is found to have a “comprehensive vocabulary” of 107; the ape, 
95. A table of some 44 pages for each gives in detail the date re- 
corded, word or phrase spoken and the response. 

In the final chapter, “Conclusions,” the authors state that in heredities 
the ape and the child were similar enough to permit similar reactions 
to the same stimuli; the environment was the activating factor. The 
child was not necessarily superior in those differences toward which 
he was favored; nor was the ape deficient in those toward which he 
was unfavored. Under Differences favorable to the Child, Likenesses 
between the Two, and Differences Favorable to the Ape, the non- 
environmental, environmental and unclassified items revealed that 
human influences were of slight significance when the ape was typi- 
cally unlike the child. The psychological environment was the more 
significant factor in the young ape’s advancement. The concept of 
“maturation” cannot completely explain her achievements. 

From the first the authors knew that both subjects should have 
been in age just at life’s beginning; the ape should have lived with 
more than one child, and the study should have been made over a 
longer period. There are references for further reading by chapters 
and a list of tests and experiments. It is a well-planned study, well 
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interpreted, well set forth with very commendable scientific accuracy, 
thoroughness and judgment. 


James P. Porter 
Ohio University 
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